From patchwork Wed Mar 10 16:54:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128545 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83333C433DB for ; Wed, 10 Mar 2021 16:54:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 00A2A64FC8 for ; Wed, 10 Mar 2021 16:54:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 00A2A64FC8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 92B0E8D01D1; Wed, 10 Mar 2021 11:54:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8DB0E8D01A6; Wed, 10 Mar 2021 11:54:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 707868D01D1; Wed, 10 Mar 2021 11:54:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5090E8D01A6 for ; Wed, 10 Mar 2021 11:54:53 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0C55687C9 for ; Wed, 10 Mar 2021 16:54:53 +0000 (UTC) X-FDA: 77904564066.26.45A23CA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 2A0128019143 for ; Wed, 10 Mar 2021 16:54:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395290; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=A1kUfhyDJENd62dFanXyJmXmUPU5LOfWFPVlFCBfXuY=; b=QCvtW+WUZh7Q1vySszxrjVT3kSPD2fvHo5rQJ0ge/73m6QQCaRc8Jcz1cTElgzGid5cMhb zImLEMWmuBn1ICTsXdeaObVYjsiKPO3Bg3fZOsjx+NAf8RvfjbTfNmxDUKTTrJ2kwPOlpB 6bAaHh4W5eFrNi9ZFDb2CArStFRV3fw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-425-fKa2XOQ8P-uqTZcPTuM4MA-1; Wed, 10 Mar 2021 11:54:46 -0500 X-MC-Unique: fKa2XOQ8P-uqTZcPTuM4MA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id DB738192FDA9; Wed, 10 Mar 2021 16:54:43 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 058B75D6D7; Wed, 10 Mar 2021 16:54:38 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 01/28] iov_iter: Add ITER_XARRAY From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Alexander Viro , "Matthew Wilcox (Oracle)" , Christoph Hellwig , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:54:38 +0000 Message-ID: <161539527815.286939.14607323792547049341.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2A0128019143 X-Stat-Signature: wyqxp81gp5544x4ghtqern7p75339bry Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf27; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395286-39815 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add an iterator, ITER_XARRAY, that walks through a set of pages attached to an xarray, starting at a given page and offset and walking for the specified amount of bytes. The iterator supports transparent huge pages. The iterate_xarray() macro calls the helper function with rcu_access() helped. I think that this is only a problem for iov_iter_for_each_range() - and that returns an error for ITER_XARRAY (also, this function does not appear to be called). The caller must guarantee that the pages are all present and they must be locked using PG_locked, PG_writeback or PG_fscache to prevent them from going away or being migrated whilst they're being accessed. This is useful for copying data from socket buffers to inodes in network filesystems and for transferring data between those inodes and the cache using direct I/O. Whilst it is true that ITER_BVEC could be used instead, that would require a bio_vec array to be allocated to refer to all the pages - which should be redundant if inode->i_pages also points to all these pages. Note that older versions of this patch implemented an ITER_MAPPING instead, which was almost the same. Signed-off-by: David Howells cc: Alexander Viro cc: Matthew Wilcox (Oracle) cc: Christoph Hellwig cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/3577430.1579705075@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/158861205740.340223.16592990225607814022.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/159465785214.1376674.6062549291411362531.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588477334.3465195.3608963255682568730.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118129703.1232039.17141248432017826976.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161026313.2537118.14676007075365418649.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340386671.1303470.10752208972482479840.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/uio.h | 11 ++ lib/iov_iter.c | 313 +++++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 301 insertions(+), 23 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 27ff8eb786dc..5f5ffc45d4aa 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -10,6 +10,7 @@ #include struct page; +struct address_space; struct pipe_inode_info; struct kvec { @@ -24,6 +25,7 @@ enum iter_type { ITER_BVEC = 16, ITER_PIPE = 32, ITER_DISCARD = 64, + ITER_XARRAY = 128, }; struct iov_iter { @@ -39,6 +41,7 @@ struct iov_iter { const struct iovec *iov; const struct kvec *kvec; const struct bio_vec *bvec; + struct xarray *xarray; struct pipe_inode_info *pipe; }; union { @@ -47,6 +50,7 @@ struct iov_iter { unsigned int head; unsigned int start_head; }; + loff_t xarray_start; }; }; @@ -80,6 +84,11 @@ static inline bool iov_iter_is_discard(const struct iov_iter *i) return iov_iter_type(i) == ITER_DISCARD; } +static inline bool iov_iter_is_xarray(const struct iov_iter *i) +{ + return iov_iter_type(i) == ITER_XARRAY; +} + static inline unsigned char iov_iter_rw(const struct iov_iter *i) { return i->type & (READ | WRITE); @@ -221,6 +230,8 @@ void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode_info *pipe, size_t count); void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); +void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, + loff_t start, size_t count); ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f66c62aa7154..f808c625c11e 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -76,7 +76,44 @@ } \ } -#define iterate_all_kinds(i, n, v, I, B, K) { \ +#define iterate_xarray(i, n, __v, skip, STEP) { \ + struct page *head = NULL; \ + size_t wanted = n, seg, offset; \ + loff_t start = i->xarray_start + skip; \ + pgoff_t index = start >> PAGE_SHIFT; \ + int j; \ + \ + XA_STATE(xas, i->xarray, index); \ + \ + rcu_read_lock(); \ + xas_for_each(&xas, head, ULONG_MAX) { \ + if (xas_retry(&xas, head)) \ + continue; \ + if (WARN_ON(xa_is_value(head))) \ + break; \ + if (WARN_ON(PageHuge(head))) \ + break; \ + for (j = (head->index < index) ? index - head->index : 0; \ + j < thp_nr_pages(head); j++) { \ + __v.bv_page = head + j; \ + offset = (i->xarray_start + skip) & ~PAGE_MASK; \ + seg = PAGE_SIZE - offset; \ + __v.bv_offset = offset; \ + __v.bv_len = min(n, seg); \ + (void)(STEP); \ + n -= __v.bv_len; \ + skip += __v.bv_len; \ + if (n == 0) \ + break; \ + } \ + if (n == 0) \ + break; \ + } \ + rcu_read_unlock(); \ + n = wanted - n; \ +} + +#define iterate_all_kinds(i, n, v, I, B, K, X) { \ if (likely(n)) { \ size_t skip = i->iov_offset; \ if (unlikely(i->type & ITER_BVEC)) { \ @@ -88,6 +125,9 @@ struct kvec v; \ iterate_kvec(i, n, v, kvec, skip, (K)) \ } else if (unlikely(i->type & ITER_DISCARD)) { \ + } else if (unlikely(i->type & ITER_XARRAY)) { \ + struct bio_vec v; \ + iterate_xarray(i, n, v, skip, (X)); \ } else { \ const struct iovec *iov; \ struct iovec v; \ @@ -96,7 +136,7 @@ } \ } -#define iterate_and_advance(i, n, v, I, B, K) { \ +#define iterate_and_advance(i, n, v, I, B, K, X) { \ if (unlikely(i->count < n)) \ n = i->count; \ if (i->count) { \ @@ -121,6 +161,9 @@ i->kvec = kvec; \ } else if (unlikely(i->type & ITER_DISCARD)) { \ skip += n; \ + } else if (unlikely(i->type & ITER_XARRAY)) { \ + struct bio_vec v; \ + iterate_xarray(i, n, v, skip, (X)) \ } else { \ const struct iovec *iov; \ struct iovec v; \ @@ -622,7 +665,9 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i) copyout(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len), memcpy_to_page(v.bv_page, v.bv_offset, (from += v.bv_len) - v.bv_len, v.bv_len), - memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len) + memcpy(v.iov_base, (from += v.iov_len) - v.iov_len, v.iov_len), + memcpy_to_page(v.bv_page, v.bv_offset, + (from += v.bv_len) - v.bv_len, v.bv_len) ) return bytes; @@ -738,6 +783,16 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i) bytes = curr_addr - s_addr - rem; return bytes; } + }), + ({ + rem = copy_mc_to_page(v.bv_page, v.bv_offset, + (from += v.bv_len) - v.bv_len, v.bv_len); + if (rem) { + curr_addr = (unsigned long) from; + bytes = curr_addr - s_addr - rem; + rcu_read_unlock(); + return bytes; + } }) ) @@ -759,7 +814,9 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) return bytes; @@ -785,7 +842,9 @@ bool _copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i) 0;}), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) iov_iter_advance(i, bytes); @@ -805,7 +864,9 @@ size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) v.iov_base, v.iov_len), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) return bytes; @@ -840,7 +901,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), memcpy_flushcache((to += v.iov_len) - v.iov_len, v.iov_base, - v.iov_len) + v.iov_len), + memcpy_page_flushcache((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) return bytes; @@ -864,7 +927,9 @@ bool _copy_from_iter_full_nocache(void *addr, size_t bytes, struct iov_iter *i) 0;}), memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) iov_iter_advance(i, bytes); @@ -901,7 +966,7 @@ size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, { if (unlikely(!page_copy_sane(page, offset, bytes))) return 0; - if (i->type & (ITER_BVEC|ITER_KVEC)) { + if (i->type & (ITER_BVEC | ITER_KVEC | ITER_XARRAY)) { void *kaddr = kmap_atomic(page); size_t wanted = copy_to_iter(kaddr + offset, bytes, i); kunmap_atomic(kaddr); @@ -924,7 +989,7 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes, WARN_ON(1); return 0; } - if (i->type & (ITER_BVEC|ITER_KVEC)) { + if (i->type & (ITER_BVEC | ITER_KVEC | ITER_XARRAY)) { void *kaddr = kmap_atomic(page); size_t wanted = _copy_from_iter(kaddr + offset, bytes, i); kunmap_atomic(kaddr); @@ -968,7 +1033,8 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i) iterate_and_advance(i, bytes, v, clear_user(v.iov_base, v.iov_len), memzero_page(v.bv_page, v.bv_offset, v.bv_len), - memset(v.iov_base, 0, v.iov_len) + memset(v.iov_base, 0, v.iov_len), + memzero_page(v.bv_page, v.bv_offset, v.bv_len) ) return bytes; @@ -992,7 +1058,9 @@ size_t iov_iter_copy_from_user_atomic(struct page *page, copyin((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page, v.bv_offset, v.bv_len), - memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len) + memcpy((p += v.iov_len) - v.iov_len, v.iov_base, v.iov_len), + memcpy_from_page((p += v.bv_len) - v.bv_len, v.bv_page, + v.bv_offset, v.bv_len) ) kunmap_atomic(kaddr); return bytes; @@ -1078,11 +1146,16 @@ void iov_iter_advance(struct iov_iter *i, size_t size) i->count -= size; return; } + if (unlikely(iov_iter_is_xarray(i))) { + i->iov_offset += size; + i->count -= size; + return; + } if (iov_iter_is_bvec(i)) { iov_iter_bvec_advance(i, size); return; } - iterate_and_advance(i, size, v, 0, 0, 0) + iterate_and_advance(i, size, v, 0, 0, 0, 0) } EXPORT_SYMBOL(iov_iter_advance); @@ -1126,7 +1199,12 @@ void iov_iter_revert(struct iov_iter *i, size_t unroll) return; } unroll -= i->iov_offset; - if (iov_iter_is_bvec(i)) { + if (iov_iter_is_xarray(i)) { + BUG(); /* We should never go beyond the start of the specified + * range since we might then be straying into pages that + * aren't pinned. + */ + } else if (iov_iter_is_bvec(i)) { const struct bio_vec *bvec = i->bvec; while (1) { size_t n = (--bvec)->bv_len; @@ -1163,9 +1241,9 @@ size_t iov_iter_single_seg_count(const struct iov_iter *i) return i->count; // it is a silly place, anyway if (i->nr_segs == 1) return i->count; - if (unlikely(iov_iter_is_discard(i))) + if (unlikely(iov_iter_is_discard(i) || iov_iter_is_xarray(i))) return i->count; - else if (iov_iter_is_bvec(i)) + if (iov_iter_is_bvec(i)) return min(i->count, i->bvec->bv_len - i->iov_offset); else return min(i->count, i->iov->iov_len - i->iov_offset); @@ -1213,6 +1291,31 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, } EXPORT_SYMBOL(iov_iter_pipe); +/** + * iov_iter_xarray - Initialise an I/O iterator to use the pages in an xarray + * @i: The iterator to initialise. + * @direction: The direction of the transfer. + * @xarray: The xarray to access. + * @start: The start file position. + * @count: The size of the I/O buffer in bytes. + * + * Set up an I/O iterator to either draw data out of the pages attached to an + * inode or to inject data into those pages. The pages *must* be prevented + * from evaporation, either by taking a ref on them or locking them by the + * caller. + */ +void iov_iter_xarray(struct iov_iter *i, unsigned int direction, + struct xarray *xarray, loff_t start, size_t count) +{ + BUG_ON(direction & ~1); + i->type = ITER_XARRAY | (direction & (READ | WRITE)); + i->xarray = xarray; + i->xarray_start = start; + i->count = count; + i->iov_offset = 0; +} +EXPORT_SYMBOL(iov_iter_xarray); + /** * iov_iter_discard - Initialise an I/O iterator that discards data * @i: The iterator to initialise. @@ -1246,7 +1349,8 @@ unsigned long iov_iter_alignment(const struct iov_iter *i) iterate_all_kinds(i, size, v, (res |= (unsigned long)v.iov_base | v.iov_len, 0), res |= v.bv_offset | v.bv_len, - res |= (unsigned long)v.iov_base | v.iov_len + res |= (unsigned long)v.iov_base | v.iov_len, + res |= v.bv_offset | v.bv_len ) return res; } @@ -1268,7 +1372,9 @@ unsigned long iov_iter_gap_alignment(const struct iov_iter *i) (res |= (!res ? 0 : (unsigned long)v.bv_offset) | (size != v.bv_len ? size : 0)), (res |= (!res ? 0 : (unsigned long)v.iov_base) | - (size != v.iov_len ? size : 0)) + (size != v.iov_len ? size : 0)), + (res |= (!res ? 0 : (unsigned long)v.bv_offset) | + (size != v.bv_len ? size : 0)) ); return res; } @@ -1318,6 +1424,75 @@ static ssize_t pipe_get_pages(struct iov_iter *i, return __pipe_get_pages(i, min(maxsize, capacity), pages, iter_head, start); } +static ssize_t iter_xarray_copy_pages(struct page **pages, struct xarray *xa, + pgoff_t index, unsigned int nr_pages) +{ + XA_STATE(xas, xa, index); + struct page *page; + unsigned int ret = 0; + + rcu_read_lock(); + for (page = xas_load(&xas); page; page = xas_next(&xas)) { + if (xas_retry(&xas, page)) + continue; + + /* Has the page moved or been split? */ + if (unlikely(page != xas_reload(&xas))) { + xas_reset(&xas); + continue; + } + + pages[ret] = find_subpage(page, xas.xa_index); + get_page(pages[ret]); + if (++ret == nr_pages) + break; + } + rcu_read_unlock(); + return ret; +} + +static ssize_t iter_xarray_get_pages(struct iov_iter *i, + struct page **pages, size_t maxsize, + unsigned maxpages, size_t *_start_offset) +{ + unsigned nr, offset; + pgoff_t index, count; + size_t size = maxsize, actual; + loff_t pos; + + if (!size || !maxpages) + return 0; + + pos = i->xarray_start + i->iov_offset; + index = pos >> PAGE_SHIFT; + offset = pos & ~PAGE_MASK; + *_start_offset = offset; + + count = 1; + if (size > PAGE_SIZE - offset) { + size -= PAGE_SIZE - offset; + count += size >> PAGE_SHIFT; + size &= ~PAGE_MASK; + if (size) + count++; + } + + if (count > maxpages) + count = maxpages; + + nr = iter_xarray_copy_pages(pages, i->xarray, index, count); + if (nr == 0) + return 0; + + actual = PAGE_SIZE * nr; + actual -= offset; + if (nr == count && size > 0) { + unsigned last_offset = (nr > 1) ? 0 : offset; + actual -= PAGE_SIZE - (last_offset + size); + } + return actual; +} + ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start) @@ -1327,6 +1502,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, if (unlikely(iov_iter_is_pipe(i))) return pipe_get_pages(i, pages, maxsize, maxpages, start); + if (unlikely(iov_iter_is_xarray(i))) + return iter_xarray_get_pages(i, pages, maxsize, maxpages, start); if (unlikely(iov_iter_is_discard(i))) return -EFAULT; @@ -1353,7 +1530,8 @@ ssize_t iov_iter_get_pages(struct iov_iter *i, return v.bv_len; }),({ return -EFAULT; - }) + }), + 0 ) return 0; } @@ -1397,6 +1575,51 @@ static ssize_t pipe_get_pages_alloc(struct iov_iter *i, return n; } +static ssize_t iter_xarray_get_pages_alloc(struct iov_iter *i, + struct page ***pages, size_t maxsize, + size_t *_start_offset) +{ + struct page **p; + unsigned nr, offset; + pgoff_t index, count; + size_t size = maxsize, actual; + loff_t pos; + + if (!size) + return 0; + + pos = i->xarray_start + i->iov_offset; + index = pos >> PAGE_SHIFT; + offset = pos & ~PAGE_MASK; + *_start_offset = offset; + + count = 1; + if (size > PAGE_SIZE - offset) { + size -= PAGE_SIZE - offset; + count += size >> PAGE_SHIFT; + size &= ~PAGE_MASK; + if (size) + count++; + } + + p = get_pages_array(count); + if (!p) + return -ENOMEM; + *pages = p; + + nr = iter_xarray_copy_pages(p, i->xarray, index, count); + if (nr == 0) + return 0; + + actual = PAGE_SIZE * nr; + actual -= offset; + if (nr == count && size > 0) { + unsigned last_offset = (nr > 1) ? 0 : offset; + actual -= PAGE_SIZE - (last_offset + size); + } + return actual; +} + ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start) @@ -1408,6 +1631,8 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, if (unlikely(iov_iter_is_pipe(i))) return pipe_get_pages_alloc(i, pages, maxsize, start); + if (unlikely(iov_iter_is_xarray(i))) + return iter_xarray_get_pages_alloc(i, pages, maxsize, start); if (unlikely(iov_iter_is_discard(i))) return -EFAULT; @@ -1440,7 +1665,7 @@ ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, return v.bv_len; }),({ return -EFAULT; - }) + }), 0 ) return 0; } @@ -1478,6 +1703,13 @@ size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, v.iov_base, v.iov_len, sum, off); off += v.iov_len; + }), ({ + char *p = kmap_atomic(v.bv_page); + sum = csum_and_memcpy((to += v.bv_len) - v.bv_len, + p + v.bv_offset, v.bv_len, + sum, off); + kunmap_atomic(p); + off += v.bv_len; }) ) *csum = sum; @@ -1519,6 +1751,13 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, v.iov_base, v.iov_len, sum, off); off += v.iov_len; + }), ({ + char *p = kmap_atomic(v.bv_page); + sum = csum_and_memcpy((to += v.bv_len) - v.bv_len, + p + v.bv_offset, v.bv_len, + sum, off); + kunmap_atomic(p); + off += v.bv_len; }) ) *csum = sum; @@ -1565,6 +1804,13 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate, (from += v.iov_len) - v.iov_len, v.iov_len, sum, off); off += v.iov_len; + }), ({ + char *p = kmap_atomic(v.bv_page); + sum = csum_and_memcpy(p + v.bv_offset, + (from += v.bv_len) - v.bv_len, + v.bv_len, sum, off); + kunmap_atomic(p); + off += v.bv_len; }) ) csstate->csum = sum; @@ -1615,6 +1861,21 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages) npages = pipe_space_for_user(iter_head, pipe->tail, pipe); if (npages >= maxpages) return maxpages; + } else if (unlikely(iov_iter_is_xarray(i))) { + unsigned offset; + + offset = (i->xarray_start + i->iov_offset) & ~PAGE_MASK; + + npages = 1; + if (size > PAGE_SIZE - offset) { + size -= PAGE_SIZE - offset; + npages += size >> PAGE_SHIFT; + size &= ~PAGE_MASK; + if (size) + npages++; + } + if (npages >= maxpages) + return maxpages; } else iterate_all_kinds(i, size, v, ({ unsigned long p = (unsigned long)v.iov_base; npages += DIV_ROUND_UP(p + v.iov_len, PAGE_SIZE) @@ -1631,7 +1892,8 @@ int iov_iter_npages(const struct iov_iter *i, int maxpages) - p / PAGE_SIZE; if (npages >= maxpages) return maxpages; - }) + }), + 0 ) return npages; } @@ -1644,7 +1906,7 @@ const void *dup_iter(struct iov_iter *new, struct iov_iter *old, gfp_t flags) WARN_ON(1); return NULL; } - if (unlikely(iov_iter_is_discard(new))) + if (unlikely(iov_iter_is_discard(new) || iov_iter_is_xarray(new))) return NULL; if (iov_iter_is_bvec(new)) return new->bvec = kmemdup(new->bvec, @@ -1849,7 +2111,12 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes, kunmap(v.bv_page); err;}), ({ w = v; - err = f(&w, context);}) + err = f(&w, context);}), ({ + w.iov_base = kmap(v.bv_page) + v.bv_offset; + w.iov_len = v.bv_len; + err = f(&w, context); + kunmap(v.bv_page); + err;}) ) return err; } From patchwork Wed Mar 10 16:54:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128547 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13F6EC433DB for ; Wed, 10 Mar 2021 16:55:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id AFB3B64FC5 for ; Wed, 10 Mar 2021 16:55:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AFB3B64FC5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 50AC08D01D2; Wed, 10 Mar 2021 11:55:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E19A8D01A6; Wed, 10 Mar 2021 11:55:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D0868D01D2; Wed, 10 Mar 2021 11:55:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0074.hostedemail.com [216.40.44.74]) by kanga.kvack.org (Postfix) with ESMTP id 2380F8D01A6 for ; Wed, 10 Mar 2021 11:55:05 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D4446181BD7A8 for ; Wed, 10 Mar 2021 16:55:04 +0000 (UTC) X-FDA: 77904564528.15.4DC33E3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 948FF6000103 for ; Wed, 10 Mar 2021 16:55:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395303; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3JZ+E12Bv/rbfH/p1hytCimW8jzaBIx4fcjnzeRaUec=; b=YKZkIZO/bGfMnnrzrYQtGpf2dHkHS7pc/+0zNKjlEhk/qQDr46qtGMq1IvevD7ZPCR+m18 DPiAKtUEO6cBixdIctkzgMeTPDUF6wwgpBI0bKiQw9ten6n32mRuhCjI/NsCLTCCjDM2fE RMuRRXejgOg/Q2QrcJHUfkTcA80J9TU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-169-hg6zAUlwNXOShk3xrouNGQ-1; Wed, 10 Mar 2021 11:55:02 -0500 X-MC-Unique: hg6zAUlwNXOShk3xrouNGQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9CD7E19324A9; Wed, 10 Mar 2021 16:54:59 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 854AA6062F; Wed, 10 Mar 2021 16:54:49 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 02/28] mm: Add an unlock function for PG_private_2/PG_fscache From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Linus Torvalds , Christoph Hellwig , "Matthew Wilcox (Oracle)" , Alexander Viro , Christoph Hellwig , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:54:49 +0000 Message-ID: <161539528910.286939.1252328699383291173.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Stat-Signature: ya3jdqxqzjjfqro9ye6hpcwgku1tdai4 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 948FF6000103 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf25; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395302-583683 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a function, unlock_page_private_2(), to unlock PG_private_2 analogous to that of PG_lock. Add a kerneldoc banner to that indicating the example usage case. A wrapper will need to be placed in the netfs header in the patch that adds that. [This implements a suggestion by Linus[1] to not mix the terminology of PG_private_2 and PG_fscache in the mm core function] Changes: - Remove extern from the declaration[2]. Suggested-by: Linus Torvalds Signed-off-by: David Howells Reviewed-by: Christoph Hellwig cc: Matthew Wilcox (Oracle) cc: Alexander Viro cc: Christoph Hellwig cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/1330473.1612974547@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/20210216102659.GA27714@lst.de/ [2] Link: https://lore.kernel.org/r/161340387944.1303470.7944159520278177652.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/pagemap.h | 1 + mm/filemap.c | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 20225b067583..ac91b33f3873 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -591,6 +591,7 @@ extern int __lock_page_async(struct page *page, struct wait_page_queue *wait); extern int __lock_page_or_retry(struct page *page, struct mm_struct *mm, unsigned int flags); extern void unlock_page(struct page *page); +void unlock_page_private_2(struct page *page); /* * Return true if the page was successfully locked diff --git a/mm/filemap.c b/mm/filemap.c index 43700480d897..925964b67583 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1432,6 +1432,26 @@ void unlock_page(struct page *page) } EXPORT_SYMBOL(unlock_page); +/** + * unlock_page_private_2 - Unlock a page that's locked with PG_private_2 + * @page: The page + * + * Unlocks a page that's locked with PG_private_2 and wakes up sleepers in + * wait_on_page_private_2(). + * + * This is, for example, used when a netfs page is being written to a local + * disk cache, thereby allowing writes to the cache for the same page to be + * serialised. + */ +void unlock_page_private_2(struct page *page) +{ + page = compound_head(page); + VM_BUG_ON_PAGE(!PagePrivate2(page), page); + clear_bit_unlock(PG_private_2, &page->flags); + wake_up_page_bit(page, PG_private_2); +} +EXPORT_SYMBOL(unlock_page_private_2); + /** * end_page_writeback - end writeback against a page * @page: the page From patchwork Wed Mar 10 16:55:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128549 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E984EC433DB for ; Wed, 10 Mar 2021 16:55:18 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9042564FC5 for ; Wed, 10 Mar 2021 16:55:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9042564FC5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3845A8D01D3; Wed, 10 Mar 2021 11:55:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 35B448D01A6; Wed, 10 Mar 2021 11:55:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1FB848D01D3; Wed, 10 Mar 2021 11:55:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0253.hostedemail.com [216.40.44.253]) by kanga.kvack.org (Postfix) with ESMTP id 043D28D01A6 for ; Wed, 10 Mar 2021 11:55:17 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AB5D945D8 for ; Wed, 10 Mar 2021 16:55:17 +0000 (UTC) X-FDA: 77904565074.23.3008F1B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf21.hostedemail.com (Postfix) with ESMTP id 651F4E00010A for ; Wed, 10 Mar 2021 16:55:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395316; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uVKrir/E2mYwb0N659VVvjNlhgvQd6dn5f4tWZkQLtk=; b=i3lEiTAo6V03Qk+DkKscOyNazU7gFUueZyvfSCCX4RX3T0d09/O0X6a19PvLjen6RNY8+3 emVLaehkGk+KaQcm2hW94bL1mE+eLMq3+CzomB8V7B4DggTdXj6lTv+kyFMVM9pZMkPy7x wcrFC5yIkmyOAlhCbURxO1R2ywm7UhA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-183-lW15TNIWNVCptwrkSDAfDw-1; Wed, 10 Mar 2021 11:55:12 -0500 X-MC-Unique: lW15TNIWNVCptwrkSDAfDw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7204084B7DF; Wed, 10 Mar 2021 16:55:10 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id C20C15945E; Wed, 10 Mar 2021 16:55:06 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 03/28] mm: Implement readahead_control pageset expansion From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: "Matthew Wilcox (Oracle)" , "Matthew Wilcox (Oracle)" , Alexander Viro , Christoph Hellwig , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:55:04 +0000 Message-ID: <161539530488.286939.18085961677838089157.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Stat-Signature: ch8ha5fif9tehkwxgs31zed6dft4dkh8 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 651F4E00010A Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf21; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395315-716306 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Provide a function, readahead_expand(), that expands the set of pages specified by a readahead_control object to encompass a revised area with a proposed size and length. The proposed area must include all of the old area and may be expanded yet more by this function so that the edges align on (transparent huge) page boundaries as allocated. The expansion will be cut short if a page already exists in either of the areas being expanded into. Note that any expansion made in such a case is not rolled back. This will be used by fscache so that reads can be expanded to cache granule boundaries, thereby allowing whole granules to be stored in the cache, but there are other potential users also. Changes: - Moved the declaration of readahead_expand() to a better place[1]. Suggested-by: Matthew Wilcox (Oracle) Signed-off-by: David Howells cc: Matthew Wilcox (Oracle) cc: Alexander Viro cc: Christoph Hellwig cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20210217161358.GM2858050@casper.infradead.org/ [1] Link: https://lore.kernel.org/r/159974633888.2094769.8326206446358128373.stgit@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/160588479816.3465195.553952688795241765.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118131787.1232039.4863969952441067985.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161028670.2537118.13831420617039766044.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340389201.1303470.14353807284546854878.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/pagemap.h | 2 + mm/readahead.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 72 insertions(+) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index ac91b33f3873..bf05e99ce588 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -818,6 +818,8 @@ void page_cache_sync_ra(struct readahead_control *, struct file_ra_state *, unsigned long req_count); void page_cache_async_ra(struct readahead_control *, struct file_ra_state *, struct page *, unsigned long req_count); +void readahead_expand(struct readahead_control *ractl, + loff_t new_start, size_t new_len); /** * page_cache_sync_readahead - generic file readahead diff --git a/mm/readahead.c b/mm/readahead.c index c5b0457415be..4446dada0bc2 100644 --- a/mm/readahead.c +++ b/mm/readahead.c @@ -638,3 +638,73 @@ SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, size_t, count) { return ksys_readahead(fd, offset, count); } + +/** + * readahead_expand - Expand a readahead request + * @ractl: The request to be expanded + * @new_start: The revised start + * @new_len: The revised size of the request + * + * Attempt to expand a readahead request outwards from the current size to the + * specified size by inserting locked pages before and after the current window + * to increase the size to the new window. This may involve the insertion of + * THPs, in which case the window may get expanded even beyond what was + * requested. + * + * The algorithm will stop if it encounters a conflicting page already in the + * pagecache and leave a smaller expansion than requested. + * + * The caller must check for this by examining the revised @ractl object for a + * different expansion than was requested. + */ +void readahead_expand(struct readahead_control *ractl, + loff_t new_start, size_t new_len) +{ + struct address_space *mapping = ractl->mapping; + pgoff_t new_index, new_nr_pages; + gfp_t gfp_mask = readahead_gfp_mask(mapping); + + new_index = new_start / PAGE_SIZE; + + /* Expand the leading edge downwards */ + while (ractl->_index > new_index) { + unsigned long index = ractl->_index - 1; + struct page *page = xa_load(&mapping->i_pages, index); + + if (page && !xa_is_value(page)) + return; /* Page apparently present */ + + page = __page_cache_alloc(gfp_mask); + if (!page) + return; + if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) { + put_page(page); + return; + } + + ractl->_nr_pages++; + ractl->_index = page->index; + } + + new_len += new_start - readahead_pos(ractl); + new_nr_pages = DIV_ROUND_UP(new_len, PAGE_SIZE); + + /* Expand the trailing edge upwards */ + while (ractl->_nr_pages < new_nr_pages) { + unsigned long index = ractl->_index + ractl->_nr_pages; + struct page *page = xa_load(&mapping->i_pages, index); + + if (page && !xa_is_value(page)) + return; /* Page apparently present */ + + page = __page_cache_alloc(gfp_mask); + if (!page) + return; + if (add_to_page_cache_lru(page, mapping, index, gfp_mask) < 0) { + put_page(page); + return; + } + ractl->_nr_pages++; + } +} +EXPORT_SYMBOL(readahead_expand); From patchwork Wed Mar 10 16:55:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128551 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C42E6C433DB for ; Wed, 10 Mar 2021 16:55:33 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 63B4964FC8 for ; Wed, 10 Mar 2021 16:55:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 63B4964FC8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E7F3B8D01D4; Wed, 10 Mar 2021 11:55:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E56B18D01A6; Wed, 10 Mar 2021 11:55:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF7DB8D01D4; Wed, 10 Mar 2021 11:55:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id B5D678D01A6 for ; Wed, 10 Mar 2021 11:55:32 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 7A4F21730861 for ; Wed, 10 Mar 2021 16:55:32 +0000 (UTC) X-FDA: 77904565704.27.EB7CAD1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf28.hostedemail.com (Postfix) with ESMTP id E365E20003A2 for ; Wed, 10 Mar 2021 16:55:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395331; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dQFazTNg8ZNuoMbVCQ8NYcIVhTkKfyT97DX5gOD7eo0=; b=a1fhTWu6ZCRQLo3ZmEPQDGXf7CvCLkvp5gF6YF+4V7ZULE0c23l+Ncsi5S9UPAN0rfBToi OrF4CiNyV4KKlfxFvYEbh9urZHl0OKPStsWRZvR/I/2haHdH9hcbKnaxm/sxAu2hruMyMh f0Wp9ISnSftkqDxdDuZ4yviQ+7UQZ4M= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-244-azPSCIZBP_eQ-riT_DCfOQ-1; Wed, 10 Mar 2021 11:55:27 -0500 X-MC-Unique: azPSCIZBP_eQ-riT_DCfOQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 63F12CC64E; Wed, 10 Mar 2021 16:55:25 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id CD7AB60C13; Wed, 10 Mar 2021 16:55:16 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 04/28] netfs: Make a netfs helper module From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:55:15 +0000 Message-ID: <161539531569.286939.18317119181653706665.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Stat-Signature: koakczeufafqjefsziuaezy38rd91y33 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: E365E20003A2 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf28; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395332-30733 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Make a netfs helper module to manage read request segmentation, caching support and transparent huge page support on behalf of a network filesystem. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588496284.3465195.10102643717770106661.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118135638.1232039.1622182202673126285.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161031028.2537118.1213974428943508753.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340391427.1303470.14884950716721956560.stgit@warthog.procyon.org.uk/ # v3 --- fs/netfs/Kconfig | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 fs/netfs/Kconfig diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig new file mode 100644 index 000000000000..2ebf90e6ca95 --- /dev/null +++ b/fs/netfs/Kconfig @@ -0,0 +1,8 @@ +# SPDX-License-Identifier: GPL-2.0-only + +config NETFS_SUPPORT + tristate "Support for network filesystem high-level I/O" + help + This option enables support for network filesystems, including + helpers for high-level buffered I/O, abstracting out read + segmentation, local caching and transparent huge page support. From patchwork Wed Mar 10 16:55:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128553 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43F7BC433DB for ; Wed, 10 Mar 2021 16:56:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D5BA264FC7 for ; Wed, 10 Mar 2021 16:56:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D5BA264FC7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 57E368D01A6; Wed, 10 Mar 2021 11:56:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5554C8D01D5; Wed, 10 Mar 2021 11:56:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DCD28D01A6; Wed, 10 Mar 2021 11:56:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0180.hostedemail.com [216.40.44.180]) by kanga.kvack.org (Postfix) with ESMTP id 225E78D01A6 for ; Wed, 10 Mar 2021 11:56:02 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C05E883F0 for ; Wed, 10 Mar 2021 16:56:01 +0000 (UTC) X-FDA: 77904566922.16.7E3EF2A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 6E324C0007CE for ; Wed, 10 Mar 2021 16:55:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395360; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1RYVoXzqFKdR4U0WYTzkOXoOXi2/ID7WAnru9qnTxRg=; b=RBxfGvWctO/gqY2jK2QaGmYhKTX3dWEmGjR4GkwJk07i/2d1ezQkp0UV/5gD2wiuxGgxrL 9dcmLmu0b7NEi6dKG8w1BVeByawLZZYh9+PaQu/d0ynv1BNOITiYVYbVzeCJbakzDaMaNu JC4qz656/3KgaJwtAT5sgIDFMA9y678= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-380-ruTx7nFRORShHGyMDeCvOg-1; Wed, 10 Mar 2021 11:55:57 -0500 X-MC-Unique: ruTx7nFRORShHGyMDeCvOg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B423210449B3; Wed, 10 Mar 2021 16:55:55 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id F24D060C6E; Wed, 10 Mar 2021 16:55:45 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 06/28] netfs, mm: Move PG_fscache helper funcs to linux/netfs.h From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:55:45 +0000 Message-ID: <161539534516.286939.6265142985563005000.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Stat-Signature: joyt59p5596jwsrwdwpyzhjb7qnf1oz6 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 6E324C0007CE Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf14; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395356-296562 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Move the PG_fscache related helper funcs (such as SetPageFsCache()) to linux/netfs.h rather than linux/fscache.h as the intention is to move to a model where they're used by the network filesystem and the helper library, but not by fscache/cachefiles itself. Signed-off-by: David Howells cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161340392347.1303470.18065131603507621762.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/fscache.h | 11 +---------- include/linux/netfs.h | 29 +++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 10 deletions(-) create mode 100644 include/linux/netfs.h diff --git a/include/linux/fscache.h b/include/linux/fscache.h index a1c928fe98e7..1f8dc72369ee 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -19,6 +19,7 @@ #include #include #include +#include #if defined(CONFIG_FSCACHE) || defined(CONFIG_FSCACHE_MODULE) #define fscache_available() (1) @@ -29,16 +30,6 @@ #endif -/* - * overload PG_private_2 to give us PG_fscache - this is used to indicate that - * a page is currently backed by a local disk cache - */ -#define PageFsCache(page) PagePrivate2((page)) -#define SetPageFsCache(page) SetPagePrivate2((page)) -#define ClearPageFsCache(page) ClearPagePrivate2((page)) -#define TestSetPageFsCache(page) TestSetPagePrivate2((page)) -#define TestClearPageFsCache(page) TestClearPagePrivate2((page)) - /* pattern used to fill dead space in an index entry */ #define FSCACHE_INDEX_DEADFILL_PATTERN 0x79 diff --git a/include/linux/netfs.h b/include/linux/netfs.h new file mode 100644 index 000000000000..cc1102040488 --- /dev/null +++ b/include/linux/netfs.h @@ -0,0 +1,29 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Network filesystem support services. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + * + * See: + * + * Documentation/filesystems/netfs_library.rst + * + * for a description of the network filesystem interface declared here. + */ + +#ifndef _LINUX_NETFS_H +#define _LINUX_NETFS_H + +#include + +/* + * Overload PG_private_2 to give us PG_fscache - this is used to indicate that + * a page is currently backed by a local disk cache + */ +#define PageFsCache(page) PagePrivate2((page)) +#define SetPageFsCache(page) SetPagePrivate2((page)) +#define ClearPageFsCache(page) ClearPagePrivate2((page)) +#define TestSetPageFsCache(page) TestSetPagePrivate2((page)) +#define TestClearPageFsCache(page) TestClearPagePrivate2((page)) + +#endif /* _LINUX_NETFS_H */ From patchwork Wed Mar 10 16:56:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128555 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 95E09C433E0 for ; Wed, 10 Mar 2021 16:56:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3343964FC5 for ; Wed, 10 Mar 2021 16:56:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3343964FC5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C83B78D01D6; Wed, 10 Mar 2021 11:56:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C5A028D01D5; Wed, 10 Mar 2021 11:56:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AFDD68D01D6; Wed, 10 Mar 2021 11:56:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id 969978D01D5 for ; Wed, 10 Mar 2021 11:56:13 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5171D82499A8 for ; Wed, 10 Mar 2021 16:56:13 +0000 (UTC) X-FDA: 77904567426.03.DCCD4D3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 0236CA0009E2 for ; Wed, 10 Mar 2021 16:56:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395372; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3RkMaKS84Erw5iqrERot0PwU2wMGkYNCFnxXlts7pQU=; b=cZvc+RPOpdqi2rwFzA+mVbQyFjOrgjS/c+M4dh25FVErPIzJmfH6/MxwFQweQ0+2Rx8URn MMfeNV/mwLl0+4eLIzChmWeWtzg1posGSsIn0kHXhmBCphlDnV/5u1eK4QxKFXQgJr+Ybo viZ25wYQojw7MIlVAKRn7LuF8HK7LPU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-596-w3Zuq-KbOdGbIIgAEsl6Sg-1; Wed, 10 Mar 2021 11:56:10 -0500 X-MC-Unique: w3Zuq-KbOdGbIIgAEsl6Sg-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 83A101934106; Wed, 10 Mar 2021 16:56:08 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 537A010013C1; Wed, 10 Mar 2021 16:56:01 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 07/28] netfs, mm: Add unlock_page_fscache() and wait_on_page_fscache() From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Linus Torvalds , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:56:00 +0000 Message-ID: <161539536093.286939.5076448803512118764.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Stat-Signature: 6jn6hi97sw7rxoqfyqahz49ghsgcjmp6 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 0236CA0009E2 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395368-111844 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add unlock_page_fscache() as an alias of unlock_page_private_2(). This allows a page 'locked' with PG_fscache to be unlocked. Add wait_on_page_fscache() to wait for PG_fscache to be unlocked. [Linus suggested putting the fscache-themed functions into the caching-specific headers rather than pagemap.h[1]] Signed-off-by: David Howells cc: Linus Torvalds cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/1330473.1612974547@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/CAHk-=wjgA-74ddehziVk=XAEMTKswPu1Yw4uaro1R3ibs27ztw@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/161340393568.1303470.4997526899111310530.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/netfs.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/include/linux/netfs.h b/include/linux/netfs.h index cc1102040488..5ed8c6dc7453 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -26,4 +26,33 @@ #define TestSetPageFsCache(page) TestSetPagePrivate2((page)) #define TestClearPageFsCache(page) TestClearPagePrivate2((page)) +/** + * unlock_page_fscache - Unlock a page that's locked with PG_fscache + * @page: The page + * + * Unlocks a page that's locked with PG_fscache and wakes up sleepers in + * wait_on_page_fscache(). This page bit is used by the netfs helpers when a + * netfs page is being written to a local disk cache, thereby allowing writes + * to the cache for the same page to be serialised. + */ +static inline void unlock_page_fscache(struct page *page) +{ + unlock_page_private_2(page); +} + +/** + * wait_on_page_fscache - Wait for PG_fscache to be cleared on a page + * @page: The page + * + * Wait for the PG_fscache (PG_private_2) page bit to be removed from a page. + * This is, for example, used to handle a netfs page being written to a local + * disk cache, thereby allowing writes to the cache for the same page to be + * serialised. + */ +static inline void wait_on_page_fscache(struct page *page) +{ + if (PageFsCache(page)) + wait_on_page_bit(compound_head(page), PG_fscache); +} + #endif /* _LINUX_NETFS_H */ From patchwork Wed Mar 10 16:56:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128557 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EBDD3C433DB for ; Wed, 10 Mar 2021 16:56:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 68F4464FC8 for ; Wed, 10 Mar 2021 16:56:32 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 68F4464FC8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E306E8D01D7; Wed, 10 Mar 2021 11:56:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E07798D01D5; Wed, 10 Mar 2021 11:56:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C32C08D01D7; Wed, 10 Mar 2021 11:56:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id 9DC548D01D5 for ; Wed, 10 Mar 2021 11:56:31 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5A67F247A for ; Wed, 10 Mar 2021 16:56:31 +0000 (UTC) X-FDA: 77904568182.25.A6BE547 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 1D8B24080F72 for ; Wed, 10 Mar 2021 16:56:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395386; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FPJX7b9dndGN393JTy0OyOUFEb/PbK+waIrfRf/dU0U=; b=I3BHJtpcy4yrI9qF/0MMZ/CxeYLQMq/TEDvfz6/XI5efEyAISkBJz0Ep9WFSebPLZ8KdlF wHAuIuKwO4g+tGrAdsaog4F6k9kPn1P7KFMp+ME34m1YUg7Uqk79A9iohFVklzKfFkgGRy 6hqCCehO/jhFemcBFbqo4W3DJKynf6E= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-329-GGcgcgwdMEacjZs1atKNYw-1; Wed, 10 Mar 2021 11:56:23 -0500 X-MC-Unique: GGcgcgwdMEacjZs1atKNYw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id AEF3210866A6; Wed, 10 Mar 2021 16:56:21 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1E91446; Wed, 10 Mar 2021 16:56:14 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 08/28] netfs: Provide readahead and readpage netfs helpers From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:56:13 +0000 Message-ID: <161539537375.286939.16642940088716990995.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Stat-Signature: a7fhyph3dhtu7riwk9feon668a4d3ktj X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 1D8B24080F72 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395384-956868 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a pair of helper functions: (*) netfs_readahead() (*) netfs_readpage() to do the work of handling a readahead or a readpage, where the page(s) that form part of the request may be split between the local cache, the server or just require clearing, and may be single pages and transparent huge pages. This is all handled within the helper. Note that while both will read from the cache if there is data present, only netfs_readahead() will expand the request beyond what it was asked to do, and only netfs_readahead() will write back to the cache. netfs_readpage(), on the other hand, is synchronous and only fetches the page (which might be a THP) it is asked for. The netfs gives the helper parameters from the VM, the cache cookie it wants to use (or NULL) and a table of operations (only one of which is mandatory): (*) expand_readahead() [optional] Called to allow the netfs to request an expansion of a readahead request to meet its own alignment requirements. This is done by changing rreq->start and rreq->len. (*) clamp_length() [optional] Called to allow the netfs to cut down a subrequest to meet its own boundary requirements. If it does this, the helper will generate additional subrequests until the full request is satisfied. (*) is_still_valid() [optional] Called to find out if the data just read from the cache has been invalidated and must be reread from the server. (*) issue_op() [required] Called to ask the netfs to issue a read to the server. The subrequest describes the read. The read request holds information about the file being accessed. The netfs can cache information in rreq->netfs_priv. Upon completion, the netfs should set the error, transferred and can also set FSCACHE_SREQ_CLEAR_TAIL and then call fscache_subreq_terminated(). (*) done() [optional] Called after the pages have been unlocked. The read request is still pinning the file and mapping and may still be pinning pages with PG_fscache. rreq->error indicates any error that has been accumulated. (*) cleanup() [optional] Called when the helper is disposing of a finished read request. This allows the netfs to clear rreq->netfs_priv. Netfs support is enabled with CONFIG_NETFS_SUPPORT=y. It will be built even if CONFIG_FSCACHE=n and in this case much of it should be optimised away, allowing the filesystem to use it even when caching is disabled. Changes: - Folded in a kerneldoc comment fix. - Folded in a fix for the error handling in the case that ENOMEM occurs. - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588497406.3465195.18003475695899726222.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118136849.1232039.8923686136144228724.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161032290.2537118.13400578415247339173.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340394873.1303470.6237319335883242536.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] --- fs/Kconfig | 1 fs/Makefile | 1 fs/netfs/Makefile | 6 fs/netfs/internal.h | 61 ++++ fs/netfs/read_helper.c | 717 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 82 +++++ 6 files changed, 868 insertions(+) create mode 100644 fs/netfs/Makefile create mode 100644 fs/netfs/internal.h create mode 100644 fs/netfs/read_helper.c diff --git a/fs/Kconfig b/fs/Kconfig index 462253ae483a..eccbcf1e3f2e 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -125,6 +125,7 @@ source "fs/overlayfs/Kconfig" menu "Caches" +source "fs/netfs/Kconfig" source "fs/fscache/Kconfig" source "fs/cachefiles/Kconfig" diff --git a/fs/Makefile b/fs/Makefile index 3215fe205256..9c708e1fbe8f 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -67,6 +67,7 @@ obj-y += devpts/ obj-$(CONFIG_DLM) += dlm/ # Do not add any filesystems before this line +obj-$(CONFIG_NETFS_SUPPORT) += netfs/ obj-$(CONFIG_FSCACHE) += fscache/ obj-$(CONFIG_REISERFS_FS) += reiserfs/ obj-$(CONFIG_EXT4_FS) += ext4/ diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile new file mode 100644 index 000000000000..4b4eff2ba369 --- /dev/null +++ b/fs/netfs/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 + +netfs-y := \ + read_helper.o + +obj-$(CONFIG_NETFS_SUPPORT) := netfs.o diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h new file mode 100644 index 000000000000..ee665c0e7dc8 --- /dev/null +++ b/fs/netfs/internal.h @@ -0,0 +1,61 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Internal definitions for network filesystem support + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#ifdef pr_fmt +#undef pr_fmt +#endif + +#define pr_fmt(fmt) "netfs: " fmt + +/* + * read_helper.c + */ +extern unsigned int netfs_debug; + +#define netfs_stat(x) do {} while(0) +#define netfs_stat_d(x) do {} while(0) + +/*****************************************************************************/ +/* + * debug tracing + */ +#define dbgprintk(FMT, ...) \ + printk("[%-6.6s] "FMT"\n", current->comm, ##__VA_ARGS__) + +#define kenter(FMT, ...) dbgprintk("==> %s("FMT")", __func__, ##__VA_ARGS__) +#define kleave(FMT, ...) dbgprintk("<== %s()"FMT"", __func__, ##__VA_ARGS__) +#define kdebug(FMT, ...) dbgprintk(FMT, ##__VA_ARGS__) + +#ifdef __KDEBUG +#define _enter(FMT, ...) kenter(FMT, ##__VA_ARGS__) +#define _leave(FMT, ...) kleave(FMT, ##__VA_ARGS__) +#define _debug(FMT, ...) kdebug(FMT, ##__VA_ARGS__) + +#elif defined(CONFIG_NETFS_DEBUG) +#define _enter(FMT, ...) \ +do { \ + if (netfs_debug) \ + kenter(FMT, ##__VA_ARGS__); \ +} while (0) + +#define _leave(FMT, ...) \ +do { \ + if (netfs_debug) \ + kleave(FMT, ##__VA_ARGS__); \ +} while (0) + +#define _debug(FMT, ...) \ +do { \ + if (netfs_debug) \ + kdebug(FMT, ##__VA_ARGS__); \ +} while (0) + +#else +#define _enter(FMT, ...) no_printk("==> %s("FMT")", __func__, ##__VA_ARGS__) +#define _leave(FMT, ...) no_printk("<== %s()"FMT"", __func__, ##__VA_ARGS__) +#define _debug(FMT, ...) no_printk(FMT, ##__VA_ARGS__) +#endif diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c new file mode 100644 index 000000000000..c5ed0dcc9571 --- /dev/null +++ b/fs/netfs/read_helper.c @@ -0,0 +1,717 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Network filesystem high-level read support. + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "internal.h" + +MODULE_DESCRIPTION("Network fs support"); +MODULE_AUTHOR("Red Hat, Inc."); +MODULE_LICENSE("GPL"); + +unsigned netfs_debug; +module_param_named(debug, netfs_debug, uint, S_IWUSR | S_IRUGO); +MODULE_PARM_DESC(netfs_debug, "Netfs support debugging mask"); + +static void netfs_rreq_work(struct work_struct *); +static void __netfs_put_subrequest(struct netfs_read_subrequest *, bool); + +static void netfs_put_subrequest(struct netfs_read_subrequest *subreq, + bool was_async) +{ + if (refcount_dec_and_test(&subreq->usage)) + __netfs_put_subrequest(subreq, was_async); +} + +static struct netfs_read_request *netfs_alloc_read_request( + const struct netfs_read_request_ops *ops, void *netfs_priv, + struct file *file) +{ + struct netfs_read_request *rreq; + + rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL); + if (rreq) { + rreq->netfs_ops = ops; + rreq->netfs_priv = netfs_priv; + rreq->inode = file_inode(file); + rreq->i_size = i_size_read(rreq->inode); + INIT_LIST_HEAD(&rreq->subrequests); + INIT_WORK(&rreq->work, netfs_rreq_work); + refcount_set(&rreq->usage, 1); + __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); + ops->init_rreq(rreq, file); + } + + return rreq; +} + +static void netfs_get_read_request(struct netfs_read_request *rreq) +{ + refcount_inc(&rreq->usage); +} + +static void netfs_rreq_clear_subreqs(struct netfs_read_request *rreq, + bool was_async) +{ + struct netfs_read_subrequest *subreq; + + while (!list_empty(&rreq->subrequests)) { + subreq = list_first_entry(&rreq->subrequests, + struct netfs_read_subrequest, rreq_link); + list_del(&subreq->rreq_link); + netfs_put_subrequest(subreq, was_async); + } +} + +static void netfs_free_read_request(struct work_struct *work) +{ + struct netfs_read_request *rreq = + container_of(work, struct netfs_read_request, work); + netfs_rreq_clear_subreqs(rreq, false); + if (rreq->netfs_priv) + rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); + kfree(rreq); +} + +static void netfs_put_read_request(struct netfs_read_request *rreq, bool was_async) +{ + if (refcount_dec_and_test(&rreq->usage)) { + if (was_async) { + rreq->work.func = netfs_free_read_request; + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_free_read_request(&rreq->work); + } + } +} + +/* + * Allocate and partially initialise an I/O request structure. + */ +static struct netfs_read_subrequest *netfs_alloc_subrequest( + struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + + subreq = kzalloc(sizeof(struct netfs_read_subrequest), GFP_KERNEL); + if (subreq) { + INIT_LIST_HEAD(&subreq->rreq_link); + refcount_set(&subreq->usage, 2); + subreq->rreq = rreq; + netfs_get_read_request(rreq); + } + + return subreq; +} + +static void netfs_get_read_subrequest(struct netfs_read_subrequest *subreq) +{ + refcount_inc(&subreq->usage); +} + +static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, + bool was_async) +{ + struct netfs_read_request *rreq = subreq->rreq; + + kfree(subreq); + netfs_put_read_request(rreq, was_async); +} + +/* + * Clear the unread part of an I/O request. + */ +static void netfs_clear_unread(struct netfs_read_subrequest *subreq) +{ + struct iov_iter iter; + + iov_iter_xarray(&iter, WRITE, &subreq->rreq->mapping->i_pages, + subreq->start + subreq->transferred, + subreq->len - subreq->transferred); + iov_iter_zero(iov_iter_count(&iter), &iter); +} + +/* + * Fill a subrequest region with zeroes. + */ +static void netfs_fill_with_zeroes(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + netfs_subreq_terminated(subreq, 0, false); +} + +/* + * Ask the netfs to issue a read request to the server for us. + * + * The netfs is expected to read from subreq->pos + subreq->transferred to + * subreq->pos + subreq->len - 1. It may not backtrack and write data into the + * buffer prior to the transferred point as it might clobber dirty data + * obtained from the cache. + * + * Alternatively, the netfs is allowed to indicate one of two things: + * + * - NETFS_SREQ_SHORT_READ: A short read - it will get called again to try and + * make progress. + * + * - NETFS_SREQ_CLEAR_TAIL: A short read - the rest of the buffer will be + * cleared. + */ +static void netfs_read_from_server(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + rreq->netfs_ops->issue_op(subreq); +} + +/* + * Release those waiting. + */ +static void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async) +{ + netfs_rreq_clear_subreqs(rreq, was_async); + netfs_put_read_request(rreq, was_async); +} + +/* + * Unlock the pages in a read operation. We need to set PG_fscache on any + * pages we're going to write back before we unlock them. + */ +static void netfs_rreq_unlock(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + struct page *page; + unsigned int iopos, account = 0; + pgoff_t start_page = rreq->start / PAGE_SIZE; + pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1; + bool subreq_failed = false; + int i; + + XA_STATE(xas, &rreq->mapping->i_pages, start_page); + + if (test_bit(NETFS_RREQ_FAILED, &rreq->flags)) { + __clear_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + __clear_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags); + } + } + + /* Walk through the pagecache and the I/O request lists simultaneously. + * We may have a mixture of cached and uncached sections and we only + * really want to write out the uncached sections. This is slightly + * complicated by the possibility that we might have huge pages with a + * mixture inside. + */ + subreq = list_first_entry(&rreq->subrequests, + struct netfs_read_subrequest, rreq_link); + iopos = 0; + subreq_failed = (subreq->error < 0); + + rcu_read_lock(); + xas_for_each(&xas, page, last_page) { + unsigned int pgpos = (page->index - start_page) * PAGE_SIZE; + unsigned int pgend = pgpos + thp_size(page); + bool pg_failed = false; + + for (;;) { + if (!subreq) { + pg_failed = true; + break; + } + if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) + SetPageFsCache(page); + pg_failed |= subreq_failed; + if (pgend < iopos + subreq->len) + break; + + account += subreq->transferred; + iopos += subreq->len; + if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + subreq = list_next_entry(subreq, rreq_link); + subreq_failed = (subreq->error < 0); + } else { + subreq = NULL; + subreq_failed = false; + } + if (pgend == iopos) + break; + } + + if (!pg_failed) { + for (i = 0; i < thp_nr_pages(page); i++) + flush_dcache_page(page); + SetPageUptodate(page); + } + + if (!test_bit(NETFS_RREQ_DONT_UNLOCK_PAGES, &rreq->flags)) { + if (page->index == rreq->no_unlock_page && + test_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags)) + _debug("no unlock"); + else + unlock_page(page); + } + } + rcu_read_unlock(); + + task_io_account_read(account); + if (rreq->netfs_ops->done) + rreq->netfs_ops->done(rreq); +} + +/* + * Handle a short read. + */ +static void netfs_rreq_short_read(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + __clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); + __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); + + netfs_get_read_subrequest(subreq); + atomic_inc(&rreq->nr_rd_ops); + netfs_read_from_server(rreq, subreq); +} + +/* + * Resubmit any short or failed operations. Returns true if we got the rreq + * ref back. + */ +static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + + WARN_ON(in_interrupt()); + + /* We don't want terminating submissions trying to wake us up whilst + * we're still going through the list. + */ + atomic_inc(&rreq->nr_rd_ops); + + __clear_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + if (subreq->error) { + if (subreq->source != NETFS_READ_FROM_CACHE) + break; + subreq->source = NETFS_DOWNLOAD_FROM_SERVER; + subreq->error = 0; + netfs_get_read_subrequest(subreq); + atomic_inc(&rreq->nr_rd_ops); + netfs_read_from_server(rreq, subreq); + } else if (test_bit(NETFS_SREQ_SHORT_READ, &subreq->flags)) { + netfs_rreq_short_read(rreq, subreq); + } + } + + /* If we decrement nr_rd_ops to 0, the usage ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_rd_ops)) + return true; + + wake_up_var(&rreq->nr_rd_ops); + return false; +} + +/* + * Assess the state of a read request and decide what to do next. + * + * Note that we could be in an ordinary kernel thread, on a workqueue or in + * softirq context at this point. We inherit a ref from the caller. + */ +static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async) +{ +again: + if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && + test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { + if (netfs_rreq_perform_resubmissions(rreq)) + goto again; + return; + } + + netfs_rreq_unlock(rreq); + + clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); + wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS); + + netfs_rreq_completed(rreq, was_async); +} + +static void netfs_rreq_work(struct work_struct *work) +{ + struct netfs_read_request *rreq = + container_of(work, struct netfs_read_request, work); + netfs_rreq_assess(rreq, false); +} + +/* + * Handle the completion of all outstanding I/O operations on a read request. + * We inherit a ref from the caller. + */ +static void netfs_rreq_terminated(struct netfs_read_request *rreq, + bool was_async) +{ + if (test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags) && + was_async) { + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_rreq_assess(rreq, was_async); + } +} + +/** + * netfs_subreq_terminated - Note the termination of an I/O operation. + * @subreq: The I/O request that has terminated. + * @transferred_or_error: The amount of data transferred or an error code. + * @was_async: The termination was asynchronous + * + * This tells the read helper that a contributory I/O operation has terminated, + * one way or another, and that it should integrate the results. + * + * The caller indicates in @transferred_or_error the outcome of the operation, + * supplying a positive value to indicate the number of bytes transferred, 0 to + * indicate a failure to transfer anything that should be retried or a negative + * error code. The helper will look after reissuing I/O operations as + * appropriate and writing downloaded data to the cache. + * + * If @was_async is true, the caller might be running in softirq or interrupt + * context and we can't sleep. + */ +void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, + ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_read_request *rreq = subreq->rreq; + int u; + + _enter("[%u]{%llx,%lx},%zd", + subreq->debug_index, subreq->start, subreq->flags, + transferred_or_error); + + if (IS_ERR_VALUE(transferred_or_error)) { + subreq->error = transferred_or_error; + goto failed; + } + + if (WARN_ON(transferred_or_error > subreq->len - subreq->transferred)) + transferred_or_error = subreq->len - subreq->transferred; + + subreq->error = 0; + subreq->transferred += transferred_or_error; + if (subreq->transferred < subreq->len) + goto incomplete; + +complete: + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) + set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); + +out: + /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ + u = atomic_dec_return(&rreq->nr_rd_ops); + if (u == 0) + netfs_rreq_terminated(rreq, was_async); + else if (u == 1) + wake_up_var(&rreq->nr_rd_ops); + + netfs_put_subrequest(subreq, was_async); + return; + +incomplete: + if (test_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags)) { + netfs_clear_unread(subreq); + subreq->transferred = subreq->len; + goto complete; + } + + if (transferred_or_error == 0) { + if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { + subreq->error = -ENODATA; + goto failed; + } + } else { + __clear_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags); + } + + __set_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); + set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + goto out; + +failed: + if (subreq->source == NETFS_READ_FROM_CACHE) { + set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + } else { + set_bit(NETFS_RREQ_FAILED, &rreq->flags); + rreq->error = subreq->error; + } + goto out; +} +EXPORT_SYMBOL(netfs_subreq_terminated); + +static enum netfs_read_source netfs_cache_prepare_read(struct netfs_read_subrequest *subreq, + loff_t i_size) +{ + struct netfs_read_request *rreq = subreq->rreq; + + if (subreq->start >= rreq->i_size) + return NETFS_FILL_WITH_ZEROES; + return NETFS_DOWNLOAD_FROM_SERVER; +} + +/* + * Work out what sort of subrequest the next one will be. + */ +static enum netfs_read_source +netfs_rreq_prepare_read(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq) +{ + enum netfs_read_source source; + + _enter("%llx-%llx,%llx", subreq->start, subreq->start + subreq->len, rreq->i_size); + + source = netfs_cache_prepare_read(subreq, rreq->i_size); + if (source == NETFS_INVALID_READ) + goto out; + + if (source == NETFS_DOWNLOAD_FROM_SERVER) { + /* Call out to the netfs to let it shrink the request to fit + * its own I/O sizes and boundaries. If it shinks it here, it + * will be called again to make simultaneous calls; if it wants + * to make serial calls, it can indicate a short read and then + * we will call it again. + */ + if (subreq->len > rreq->i_size - subreq->start) + subreq->len = rreq->i_size - subreq->start; + + if (rreq->netfs_ops->clamp_length && + !rreq->netfs_ops->clamp_length(subreq)) { + source = NETFS_INVALID_READ; + goto out; + } + } + + if (WARN_ON(subreq->len == 0)) + source = NETFS_INVALID_READ; + +out: + subreq->source = source; + return source; +} + +/* + * Slice off a piece of a read request and submit an I/O request for it. + */ +static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, + unsigned int *_debug_index) +{ + struct netfs_read_subrequest *subreq; + enum netfs_read_source source; + + subreq = netfs_alloc_subrequest(rreq); + if (!subreq) + return false; + + subreq->debug_index = (*_debug_index)++; + subreq->start = rreq->start + rreq->submitted; + subreq->len = rreq->len - rreq->submitted; + + _debug("slice %llx,%zx,%zx", subreq->start, subreq->len, rreq->submitted); + list_add_tail(&subreq->rreq_link, &rreq->subrequests); + + /* Call out to the cache to find out what it can do with the remaining + * subset. It tells us in subreq->flags what it decided should be done + * and adjusts subreq->len down if the subset crosses a cache boundary. + * + * Then when we hand the subset, it can choose to take a subset of that + * (the starts must coincide), in which case, we go around the loop + * again and ask it to download the next piece. + */ + source = netfs_rreq_prepare_read(rreq, subreq); + if (source == NETFS_INVALID_READ) + goto subreq_failed; + + atomic_inc(&rreq->nr_rd_ops); + + rreq->submitted += subreq->len; + + switch (source) { + case NETFS_FILL_WITH_ZEROES: + netfs_fill_with_zeroes(rreq, subreq); + break; + case NETFS_DOWNLOAD_FROM_SERVER: + netfs_read_from_server(rreq, subreq); + break; + default: + BUG(); + } + + return true; + +subreq_failed: + rreq->error = subreq->error; + netfs_put_subrequest(subreq, false); + return false; +} + +static void netfs_rreq_expand(struct netfs_read_request *rreq, + struct readahead_control *ractl) +{ + /* Give the netfs a chance to change the request parameters. The + * resultant request must contain the original region. + */ + if (rreq->netfs_ops->expand_readahead) + rreq->netfs_ops->expand_readahead(rreq); + + /* Expand the request if the cache wants it to start earlier. Note + * that the expansion may get further extended if the VM wishes to + * insert THPs and the preferred start and/or end wind up in the middle + * of THPs. + * + * If this is the case, however, the THP size should be an integer + * multiple of the cache granule size, so we get a whole number of + * granules to deal with. + */ + if (rreq->start != readahead_pos(ractl) || + rreq->len != readahead_length(ractl)) { + readahead_expand(ractl, rreq->start, rreq->len); + rreq->start = readahead_pos(ractl); + rreq->len = readahead_length(ractl); + } +} + +/** + * netfs_readahead - Helper to manage a read request + * @ractl: The description of the readahead request + * @ops: The network filesystem's operations for the helper to use + * @netfs_priv: Private netfs data to be retained in the request + * + * Fulfil a readahead request by drawing data from the cache if possible, or + * the netfs if not. Space beyond the EOF is zero-filled. Multiple I/O + * requests from different sources will get munged together. If necessary, the + * readahead window can be expanded in either direction to a more convenient + * alighment for RPC efficiency or to make storage in the cache feasible. + * + * The calling netfs must provide a table of operations, only one of which, + * issue_op, is mandatory. It may also be passed a private token, which will + * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). + * + * This is usable whether or not caching is enabled. + */ +void netfs_readahead(struct readahead_control *ractl, + const struct netfs_read_request_ops *ops, + void *netfs_priv) +{ + struct netfs_read_request *rreq; + struct page *page; + unsigned int debug_index = 0; + + _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); + + if (readahead_count(ractl) == 0) + goto cleanup; + + rreq = netfs_alloc_read_request(ops, netfs_priv, ractl->file); + if (!rreq) + goto cleanup; + rreq->mapping = ractl->mapping; + rreq->start = readahead_pos(ractl); + rreq->len = readahead_length(ractl); + + netfs_rreq_expand(rreq, ractl); + + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + while ((page = readahead_page(ractl))) + put_page(page); + + /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_rd_ops)) + netfs_rreq_assess(rreq, false); + return; + +cleanup: + if (netfs_priv) + ops->cleanup(ractl->mapping, netfs_priv); + return; +} +EXPORT_SYMBOL(netfs_readahead); + +/** + * netfs_page - Helper to manage a readpage request + * @file: The file to read from + * @page: The page to read + * @ops: The network filesystem's operations for the helper to use + * @netfs_priv: Private netfs data to be retained in the request + * + * Fulfil a readpage request by drawing data from the cache if possible, or the + * netfs if not. Space beyond the EOF is zero-filled. Multiple I/O requests + * from different sources will get munged together. + * + * The calling netfs must provide a table of operations, only one of which, + * issue_op, is mandatory. It may also be passed a private token, which will + * be retained in rreq->netfs_priv and will be cleaned up by ops->cleanup(). + * + * This is usable whether or not caching is enabled. + */ +int netfs_readpage(struct file *file, + struct page *page, + const struct netfs_read_request_ops *ops, + void *netfs_priv) +{ + struct netfs_read_request *rreq; + unsigned int debug_index = 0; + int ret; + + _enter("%lx", page->index); + + rreq = netfs_alloc_read_request(ops, netfs_priv, file); + if (!rreq) { + if (netfs_priv) + ops->cleanup(netfs_priv, page->mapping); + unlock_page(page); + return -ENOMEM; + } + rreq->mapping = page->mapping; + rreq->start = page->index * PAGE_SIZE; + rreq->len = thp_size(page); + + netfs_get_read_request(rreq); + + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + /* Keep nr_rd_ops incremented so that the ref always belongs to us, and + * the service code isn't punted off to a random thread pool to + * process. + */ + do { + wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1); + netfs_rreq_assess(rreq, false); + } while (test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)); + + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len) + ret = -EIO; + netfs_put_read_request(rreq, false); + return ret; +} +EXPORT_SYMBOL(netfs_readpage); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 5ed8c6dc7453..d5453f625610 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -14,6 +14,8 @@ #ifndef _LINUX_NETFS_H #define _LINUX_NETFS_H +#include +#include #include /* @@ -55,4 +57,84 @@ static inline void wait_on_page_fscache(struct page *page) wait_on_page_bit(compound_head(page), PG_fscache); } +enum netfs_read_source { + NETFS_FILL_WITH_ZEROES, + NETFS_DOWNLOAD_FROM_SERVER, + NETFS_READ_FROM_CACHE, + NETFS_INVALID_READ, +} __mode(byte); + +/* + * Descriptor for a single component subrequest. + */ +struct netfs_read_subrequest { + struct netfs_read_request *rreq; /* Supervising read request */ + struct list_head rreq_link; /* Link in rreq->subrequests */ + loff_t start; /* Where to start the I/O */ + size_t len; /* Size of the I/O */ + size_t transferred; /* Amount of data transferred */ + refcount_t usage; + short error; /* 0 or error that occurred */ + unsigned short debug_index; /* Index in list (for debugging output) */ + enum netfs_read_source source; /* Where to read from */ + unsigned long flags; +#define NETFS_SREQ_WRITE_TO_CACHE 0 /* Set if should write to cache */ +#define NETFS_SREQ_CLEAR_TAIL 1 /* Set if the rest of the read should be cleared */ +#define NETFS_SREQ_SHORT_READ 2 /* Set if there was a short read from the cache */ +#define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */ +#define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */ +}; + +/* + * Descriptor for a read helper request. This is used to make multiple I/O + * requests on a variety of sources and then stitch the result together. + */ +struct netfs_read_request { + struct work_struct work; + struct inode *inode; /* The file being accessed */ + struct address_space *mapping; /* The mapping being accessed */ + struct list_head subrequests; /* Requests to fetch I/O from disk or net */ + void *netfs_priv; /* Private data for the netfs */ + atomic_t nr_rd_ops; /* Number of read ops in progress */ + size_t submitted; /* Amount submitted for I/O so far */ + size_t len; /* Length of the request */ + short error; /* 0 or error that occurred */ + loff_t i_size; /* Size of the file */ + loff_t start; /* Start position */ + pgoff_t no_unlock_page; /* Don't unlock this page after read */ + refcount_t usage; + unsigned long flags; +#define NETFS_RREQ_INCOMPLETE_IO 0 /* Some ioreqs terminated short or with error */ +#define NETFS_RREQ_WRITE_TO_CACHE 1 /* Need to write to the cache */ +#define NETFS_RREQ_NO_UNLOCK_PAGE 2 /* Don't unlock no_unlock_page on completion */ +#define NETFS_RREQ_DONT_UNLOCK_PAGES 3 /* Don't unlock the pages on completion */ +#define NETFS_RREQ_FAILED 4 /* The request failed */ +#define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */ + const struct netfs_read_request_ops *netfs_ops; +}; + +/* + * Operations the network filesystem can/must provide to the helpers. + */ +struct netfs_read_request_ops { + void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); + void (*expand_readahead)(struct netfs_read_request *rreq); + bool (*clamp_length)(struct netfs_read_subrequest *subreq); + void (*issue_op)(struct netfs_read_subrequest *subreq); + bool (*is_still_valid)(struct netfs_read_request *rreq); + void (*done)(struct netfs_read_request *rreq); + void (*cleanup)(struct address_space *mapping, void *netfs_priv); +}; + +struct readahead_control; +extern void netfs_readahead(struct readahead_control *, + const struct netfs_read_request_ops *, + void *); +extern int netfs_readpage(struct file *, + struct page *, + const struct netfs_read_request_ops *, + void *); + +extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); + #endif /* _LINUX_NETFS_H */ From patchwork Wed Mar 10 16:56:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128559 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABC69C433E0 for ; Wed, 10 Mar 2021 16:56:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 406CB64FC6 for ; Wed, 10 Mar 2021 16:56:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 406CB64FC6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D92FF8D01D8; Wed, 10 Mar 2021 11:56:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D6A338D01D5; Wed, 10 Mar 2021 11:56:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE5C18D01D8; Wed, 10 Mar 2021 11:56:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0079.hostedemail.com [216.40.44.79]) by kanga.kvack.org (Postfix) with ESMTP id A2D2C8D01D5 for ; Wed, 10 Mar 2021 11:56:39 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 51B73181C43A9 for ; Wed, 10 Mar 2021 16:56:39 +0000 (UTC) X-FDA: 77904568518.07.428BCC0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf08.hostedemail.com (Postfix) with ESMTP id E07258019151 for ; Wed, 10 Mar 2021 16:56:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395398; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4GCv4I1iASJ1LZas4NpXejo/8oEgSQY2RrQvJ/ewopk=; b=QtYeLIT5q0K5CX8Bg6QuG0TG9uCo4kGBzL8w7xXSeNdOp+ucHEHslOk2hU8hub6SfRDpLt +dywY3oG6HlA+7kYdVloqN3eii8T2/BSQOYcxMg5V1OuWzy4IqpUg/vIPXAeSZq/M6wELX M8d4xP6mt+KSgiZA2hoDSB8GrvuiFA8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-214-HTL4YWLlOEuKZYNz0HIkig-1; Wed, 10 Mar 2021 11:56:36 -0500 X-MC-Unique: HTL4YWLlOEuKZYNz0HIkig-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5361380432D; Wed, 10 Mar 2021 16:56:34 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0A7732CFDB; Wed, 10 Mar 2021 16:56:27 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 09/28] netfs: Add tracepoints From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:56:26 +0000 Message-ID: <161539538693.286939.10171713520419106334.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Stat-Signature: jwqg6y836mu6a67r9sowscs3qj35etfh X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: E07258019151 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf08; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395391-213783 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add three tracepoints to track the activity of the read helpers: (1) netfs/netfs_read This logs entry to the read helpers and also expansion of the range in a readahead request. (2) netfs/netfs_rreq This logs the progress of netfs_read_request objects which track read requests. A read request may be a compound of multiple subrequests. (3) netfs/netfs_sreq This logs the progress of netfs_read_subrequest objects, which track the contributions from various sources to a read request. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118138060.1232039.5353374588021776217.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161033468.2537118.14021843889844001905.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340395843.1303470.7355519662919639648.stgit@warthog.procyon.org.uk/ # v3 --- fs/netfs/read_helper.c | 28 ++++++ include/linux/netfs.h | 2 include/trace/events/netfs.h | 199 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 229 insertions(+) create mode 100644 include/trace/events/netfs.h diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index c5ed0dcc9571..e64d9dad0c36 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -16,6 +16,8 @@ #include #include #include "internal.h" +#define CREATE_TRACE_POINTS +#include MODULE_DESCRIPTION("Network fs support"); MODULE_AUTHOR("Red Hat, Inc."); @@ -39,6 +41,7 @@ static struct netfs_read_request *netfs_alloc_read_request( const struct netfs_read_request_ops *ops, void *netfs_priv, struct file *file) { + static atomic_t debug_ids; struct netfs_read_request *rreq; rreq = kzalloc(sizeof(struct netfs_read_request), GFP_KERNEL); @@ -47,6 +50,7 @@ static struct netfs_read_request *netfs_alloc_read_request( rreq->netfs_priv = netfs_priv; rreq->inode = file_inode(file); rreq->i_size = i_size_read(rreq->inode); + rreq->debug_id = atomic_inc_return(&debug_ids); INIT_LIST_HEAD(&rreq->subrequests); INIT_WORK(&rreq->work, netfs_rreq_work); refcount_set(&rreq->usage, 1); @@ -82,6 +86,7 @@ static void netfs_free_read_request(struct work_struct *work) netfs_rreq_clear_subreqs(rreq, false); if (rreq->netfs_priv) rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); + trace_netfs_rreq(rreq, netfs_rreq_trace_free); kfree(rreq); } @@ -127,6 +132,7 @@ static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, { struct netfs_read_request *rreq = subreq->rreq; + trace_netfs_sreq(subreq, netfs_sreq_trace_free); kfree(subreq); netfs_put_read_request(rreq, was_async); } @@ -181,6 +187,7 @@ static void netfs_read_from_server(struct netfs_read_request *rreq, */ static void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async) { + trace_netfs_rreq(rreq, netfs_rreq_trace_done); netfs_rreq_clear_subreqs(rreq, was_async); netfs_put_read_request(rreq, was_async); } @@ -219,6 +226,8 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) iopos = 0; subreq_failed = (subreq->error < 0); + trace_netfs_rreq(rreq, netfs_rreq_trace_unlock); + rcu_read_lock(); xas_for_each(&xas, page, last_page) { unsigned int pgpos = (page->index - start_page) * PAGE_SIZE; @@ -279,6 +288,8 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq, __clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); + trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short); + netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); netfs_read_from_server(rreq, subreq); @@ -294,6 +305,8 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) WARN_ON(in_interrupt()); + trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); + /* We don't want terminating submissions trying to wake us up whilst * we're still going through the list. */ @@ -306,6 +319,7 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) break; subreq->source = NETFS_DOWNLOAD_FROM_SERVER; subreq->error = 0; + trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead); netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); netfs_read_from_server(rreq, subreq); @@ -330,6 +344,8 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) */ static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async) { + trace_netfs_rreq(rreq, netfs_rreq_trace_assess); + again: if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { @@ -417,6 +433,8 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags); out: + trace_netfs_sreq(subreq, netfs_sreq_trace_terminated); + /* If we decrement nr_rd_ops to 0, the ref belongs to us. */ u = atomic_dec_return(&rreq->nr_rd_ops); if (u == 0) @@ -505,6 +523,7 @@ netfs_rreq_prepare_read(struct netfs_read_request *rreq, out: subreq->source = source; + trace_netfs_sreq(subreq, netfs_sreq_trace_prepare); return source; } @@ -544,6 +563,7 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, rreq->submitted += subreq->len; + trace_netfs_sreq(subreq, netfs_sreq_trace_submit); switch (source) { case NETFS_FILL_WITH_ZEROES: netfs_fill_with_zeroes(rreq, subreq); @@ -586,6 +606,9 @@ static void netfs_rreq_expand(struct netfs_read_request *rreq, readahead_expand(ractl, rreq->start, rreq->len); rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); + + trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), + netfs_read_trace_expanded); } } @@ -627,6 +650,9 @@ void netfs_readahead(struct readahead_control *ractl, rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); + trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), + netfs_read_trace_readahead); + netfs_rreq_expand(rreq, ractl); atomic_set(&rreq->nr_rd_ops, 1); @@ -690,6 +716,8 @@ int netfs_readpage(struct file *file, rreq->start = page->index * PAGE_SIZE; rreq->len = thp_size(page); + trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); + netfs_get_read_request(rreq); atomic_set(&rreq->nr_rd_ops, 1); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index d5453f625610..156a19f47213 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -95,6 +95,8 @@ struct netfs_read_request { struct address_space *mapping; /* The mapping being accessed */ struct list_head subrequests; /* Requests to fetch I/O from disk or net */ void *netfs_priv; /* Private data for the netfs */ + unsigned int debug_id; + unsigned int cookie_debug_id; atomic_t nr_rd_ops; /* Number of read ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h new file mode 100644 index 000000000000..12ad382764c5 --- /dev/null +++ b/include/trace/events/netfs.h @@ -0,0 +1,199 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* Network filesystem support module tracepoints + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM netfs + +#if !defined(_TRACE_NETFS_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_NETFS_H + +#include + +/* + * Define enums for tracing information. + */ +#ifndef __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY +#define __NETFS_DECLARE_TRACE_ENUMS_ONCE_ONLY + +enum netfs_read_trace { + netfs_read_trace_expanded, + netfs_read_trace_readahead, + netfs_read_trace_readpage, +}; + +enum netfs_rreq_trace { + netfs_rreq_trace_assess, + netfs_rreq_trace_done, + netfs_rreq_trace_free, + netfs_rreq_trace_resubmit, + netfs_rreq_trace_unlock, + netfs_rreq_trace_unmark, + netfs_rreq_trace_write, +}; + +enum netfs_sreq_trace { + netfs_sreq_trace_download_instead, + netfs_sreq_trace_free, + netfs_sreq_trace_prepare, + netfs_sreq_trace_resubmit_short, + netfs_sreq_trace_submit, + netfs_sreq_trace_terminated, + netfs_sreq_trace_write, + netfs_sreq_trace_write_term, +}; + +#endif + +#define netfs_read_traces \ + EM(netfs_read_trace_expanded, "EXPANDED ") \ + EM(netfs_read_trace_readahead, "READAHEAD") \ + E_(netfs_read_trace_readpage, "READPAGE ") + +#define netfs_rreq_traces \ + EM(netfs_rreq_trace_assess, "ASSESS") \ + EM(netfs_rreq_trace_done, "DONE ") \ + EM(netfs_rreq_trace_free, "FREE ") \ + EM(netfs_rreq_trace_resubmit, "RESUBM") \ + EM(netfs_rreq_trace_unlock, "UNLOCK") \ + EM(netfs_rreq_trace_unmark, "UNMARK") \ + E_(netfs_rreq_trace_write, "WRITE ") + +#define netfs_sreq_sources \ + EM(NETFS_FILL_WITH_ZEROES, "ZERO") \ + EM(NETFS_DOWNLOAD_FROM_SERVER, "DOWN") \ + EM(NETFS_READ_FROM_CACHE, "READ") \ + E_(NETFS_INVALID_READ, "INVL") \ + +#define netfs_sreq_traces \ + EM(netfs_sreq_trace_download_instead, "RDOWN") \ + EM(netfs_sreq_trace_free, "FREE ") \ + EM(netfs_sreq_trace_prepare, "PREP ") \ + EM(netfs_sreq_trace_resubmit_short, "SHORT") \ + EM(netfs_sreq_trace_submit, "SUBMT") \ + EM(netfs_sreq_trace_terminated, "TERM ") \ + EM(netfs_sreq_trace_write, "WRITE") \ + E_(netfs_sreq_trace_write_term, "WTERM") + + +/* + * Export enum symbols via userspace. + */ +#undef EM +#undef E_ +#define EM(a, b) TRACE_DEFINE_ENUM(a); +#define E_(a, b) TRACE_DEFINE_ENUM(a); + +netfs_read_traces; +netfs_rreq_traces; +netfs_sreq_sources; +netfs_sreq_traces; + +/* + * Now redefine the EM() and E_() macros to map the enums to the strings that + * will be printed in the output. + */ +#undef EM +#undef E_ +#define EM(a, b) { a, b }, +#define E_(a, b) { a, b } + +TRACE_EVENT(netfs_read, + TP_PROTO(struct netfs_read_request *rreq, + loff_t start, size_t len, + enum netfs_read_trace what), + + TP_ARGS(rreq, start, len, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(unsigned int, cookie ) + __field(loff_t, start ) + __field(size_t, len ) + __field(enum netfs_read_trace, what ) + ), + + TP_fast_assign( + __entry->rreq = rreq->debug_id; + __entry->cookie = rreq->cookie_debug_id; + __entry->start = start; + __entry->len = len; + __entry->what = what; + ), + + TP_printk("R=%08x %s c=%08x s=%llx %zx", + __entry->rreq, + __print_symbolic(__entry->what, netfs_read_traces), + __entry->cookie, + __entry->start, __entry->len) + ); + +TRACE_EVENT(netfs_rreq, + TP_PROTO(struct netfs_read_request *rreq, + enum netfs_rreq_trace what), + + TP_ARGS(rreq, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(unsigned short, flags ) + __field(enum netfs_rreq_trace, what ) + ), + + TP_fast_assign( + __entry->rreq = rreq->debug_id; + __entry->flags = rreq->flags; + __entry->what = what; + ), + + TP_printk("R=%08x %s f=%02x", + __entry->rreq, + __print_symbolic(__entry->what, netfs_rreq_traces), + __entry->flags) + ); + +TRACE_EVENT(netfs_sreq, + TP_PROTO(struct netfs_read_subrequest *sreq, + enum netfs_sreq_trace what), + + TP_ARGS(sreq, what), + + TP_STRUCT__entry( + __field(unsigned int, rreq ) + __field(unsigned short, index ) + __field(short, error ) + __field(unsigned short, flags ) + __field(enum netfs_read_source, source ) + __field(enum netfs_sreq_trace, what ) + __field(size_t, len ) + __field(size_t, transferred ) + __field(loff_t, start ) + ), + + TP_fast_assign( + __entry->rreq = sreq->rreq->debug_id; + __entry->index = sreq->debug_index; + __entry->error = sreq->error; + __entry->flags = sreq->flags; + __entry->source = sreq->source; + __entry->what = what; + __entry->len = sreq->len; + __entry->transferred = sreq->transferred; + __entry->start = sreq->start; + ), + + TP_printk("R=%08x[%u] %s %s f=%02x s=%llx %zx/%zx e=%d", + __entry->rreq, __entry->index, + __print_symbolic(__entry->what, netfs_sreq_traces), + __print_symbolic(__entry->source, netfs_sreq_sources), + __entry->flags, + __entry->start, __entry->transferred, __entry->len, + __entry->error) + ); + +#endif /* _TRACE_NETFS_H */ + +/* This part must be outside protection */ +#include From patchwork Wed Mar 10 16:56:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128561 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C237FC433DB for ; Wed, 10 Mar 2021 16:56:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5E74764FC6 for ; Wed, 10 Mar 2021 16:56:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5E74764FC6 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id ECB6A8D01D9; Wed, 10 Mar 2021 11:56:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EA1B98D01D5; Wed, 10 Mar 2021 11:56:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D432A8D01D9; Wed, 10 Mar 2021 11:56:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0003.hostedemail.com [216.40.44.3]) by kanga.kvack.org (Postfix) with ESMTP id B90288D01D5 for ; Wed, 10 Mar 2021 11:56:57 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6A4F4181C4420 for ; Wed, 10 Mar 2021 16:56:57 +0000 (UTC) X-FDA: 77904569274.08.9FD616F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf12.hostedemail.com (Postfix) with ESMTP id 2893CDC for ; Wed, 10 Mar 2021 16:56:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395416; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Rkh5lVhzzHqr4Q+Ob+vOhztGy8XP0TYRI88h8la+vqA=; b=ej69KdFlz2pyFWRF0+vkyEXFJlKG26Sc2q94qKG2jC76oTLi90D8K4SVm7x6nGqtjC5g6h 98Xf4M8V1WrG2plStslYZ3S+Xl27OIKwltBY2/JjCxITCmYQvjvepsLV5Yw22ckboM4Ycn dknYe3S7GOWXBsfuc1LCUqRwvP0iqVk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-466-KNSDZm0nP0adO2WmHCCFnw-1; Wed, 10 Mar 2021 11:56:52 -0500 X-MC-Unique: KNSDZm0nP0adO2WmHCCFnw-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 3099B8189E3; Wed, 10 Mar 2021 16:56:50 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 156B460C6E; Wed, 10 Mar 2021 16:56:40 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 10/28] netfs: Gather stats From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:56:39 +0000 Message-ID: <161539539959.286939.6794352576462965914.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Stat-Signature: zw48xzwrdor464dko1r17uqjm1idp5te X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2893CDC Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf12; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395413-610401 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Gather statistics from the netfs interface that can be exported through a seqfile. This is intended to be called by a later patch when viewing /proc/fs/fscache/stats. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118139247.1232039.10556850937548511068.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161034669.2537118.2761232524997091480.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340397101.1303470.17581910581108378458.stgit@warthog.procyon.org.uk/ # v3 --- fs/netfs/Kconfig | 15 +++++++++++++ fs/netfs/Makefile | 3 +-- fs/netfs/internal.h | 34 ++++++++++++++++++++++++++++++ fs/netfs/read_helper.c | 23 ++++++++++++++++++++ fs/netfs/stats.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 1 + 6 files changed, 128 insertions(+), 2 deletions(-) create mode 100644 fs/netfs/stats.c diff --git a/fs/netfs/Kconfig b/fs/netfs/Kconfig index 2ebf90e6ca95..578112713703 100644 --- a/fs/netfs/Kconfig +++ b/fs/netfs/Kconfig @@ -6,3 +6,18 @@ config NETFS_SUPPORT This option enables support for network filesystems, including helpers for high-level buffered I/O, abstracting out read segmentation, local caching and transparent huge page support. + +config NETFS_STATS + bool "Gather statistical information on local caching" + depends on NETFS_SUPPORT && PROC_FS + help + This option causes statistical information to be gathered on local + caching and exported through file: + + /proc/fs/fscache/stats + + The gathering of statistics adds a certain amount of overhead to + execution as there are a quite a few stats gathered, and on a + multi-CPU system these may be on cachelines that keep bouncing + between CPUs. On the other hand, the stats are very useful for + debugging purposes. Saying 'Y' here is recommended. diff --git a/fs/netfs/Makefile b/fs/netfs/Makefile index 4b4eff2ba369..c15bfc966d96 100644 --- a/fs/netfs/Makefile +++ b/fs/netfs/Makefile @@ -1,6 +1,5 @@ # SPDX-License-Identifier: GPL-2.0 -netfs-y := \ - read_helper.o +netfs-y := read_helper.o stats.o obj-$(CONFIG_NETFS_SUPPORT) := netfs.o diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index ee665c0e7dc8..98b6f4516da1 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -16,8 +16,42 @@ */ extern unsigned int netfs_debug; +/* + * stats.c + */ +#ifdef CONFIG_NETFS_STATS +extern atomic_t netfs_n_rh_readahead; +extern atomic_t netfs_n_rh_readpage; +extern atomic_t netfs_n_rh_rreq; +extern atomic_t netfs_n_rh_sreq; +extern atomic_t netfs_n_rh_download; +extern atomic_t netfs_n_rh_download_done; +extern atomic_t netfs_n_rh_download_failed; +extern atomic_t netfs_n_rh_download_instead; +extern atomic_t netfs_n_rh_read; +extern atomic_t netfs_n_rh_read_done; +extern atomic_t netfs_n_rh_read_failed; +extern atomic_t netfs_n_rh_zero; +extern atomic_t netfs_n_rh_short_read; +extern atomic_t netfs_n_rh_write; +extern atomic_t netfs_n_rh_write_done; +extern atomic_t netfs_n_rh_write_failed; + + +static inline void netfs_stat(atomic_t *stat) +{ + atomic_inc(stat); +} + +static inline void netfs_stat_d(atomic_t *stat) +{ + atomic_dec(stat); +} + +#else #define netfs_stat(x) do {} while(0) #define netfs_stat_d(x) do {} while(0) +#endif /*****************************************************************************/ /* diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index e64d9dad0c36..8b03fcc2f8f1 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -56,6 +56,7 @@ static struct netfs_read_request *netfs_alloc_read_request( refcount_set(&rreq->usage, 1); __set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags); ops->init_rreq(rreq, file); + netfs_stat(&netfs_n_rh_rreq); } return rreq; @@ -88,6 +89,7 @@ static void netfs_free_read_request(struct work_struct *work) rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); trace_netfs_rreq(rreq, netfs_rreq_trace_free); kfree(rreq); + netfs_stat_d(&netfs_n_rh_rreq); } static void netfs_put_read_request(struct netfs_read_request *rreq, bool was_async) @@ -117,6 +119,7 @@ static struct netfs_read_subrequest *netfs_alloc_subrequest( refcount_set(&subreq->usage, 2); subreq->rreq = rreq; netfs_get_read_request(rreq); + netfs_stat(&netfs_n_rh_sreq); } return subreq; @@ -134,6 +137,7 @@ static void __netfs_put_subrequest(struct netfs_read_subrequest *subreq, trace_netfs_sreq(subreq, netfs_sreq_trace_free); kfree(subreq); + netfs_stat_d(&netfs_n_rh_sreq); netfs_put_read_request(rreq, was_async); } @@ -156,6 +160,7 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq) static void netfs_fill_with_zeroes(struct netfs_read_request *rreq, struct netfs_read_subrequest *subreq) { + netfs_stat(&netfs_n_rh_zero); __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); netfs_subreq_terminated(subreq, 0, false); } @@ -179,6 +184,7 @@ static void netfs_fill_with_zeroes(struct netfs_read_request *rreq, static void netfs_read_from_server(struct netfs_read_request *rreq, struct netfs_read_subrequest *subreq) { + netfs_stat(&netfs_n_rh_download); rreq->netfs_ops->issue_op(subreq); } @@ -288,6 +294,7 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq, __clear_bit(NETFS_SREQ_SHORT_READ, &subreq->flags); __set_bit(NETFS_SREQ_SEEK_DATA_READ, &subreq->flags); + netfs_stat(&netfs_n_rh_short_read); trace_netfs_sreq(subreq, netfs_sreq_trace_resubmit_short); netfs_get_read_subrequest(subreq); @@ -319,6 +326,7 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) break; subreq->source = NETFS_DOWNLOAD_FROM_SERVER; subreq->error = 0; + netfs_stat(&netfs_n_rh_download_instead); trace_netfs_sreq(subreq, netfs_sreq_trace_download_instead); netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); @@ -414,6 +422,17 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, subreq->debug_index, subreq->start, subreq->flags, transferred_or_error); + switch (subreq->source) { + case NETFS_READ_FROM_CACHE: + netfs_stat(&netfs_n_rh_read_done); + break; + case NETFS_DOWNLOAD_FROM_SERVER: + netfs_stat(&netfs_n_rh_download_done); + break; + default: + break; + } + if (IS_ERR_VALUE(transferred_or_error)) { subreq->error = transferred_or_error; goto failed; @@ -467,8 +486,10 @@ void netfs_subreq_terminated(struct netfs_read_subrequest *subreq, failed: if (subreq->source == NETFS_READ_FROM_CACHE) { + netfs_stat(&netfs_n_rh_read_failed); set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); } else { + netfs_stat(&netfs_n_rh_download_failed); set_bit(NETFS_RREQ_FAILED, &rreq->flags); rreq->error = subreq->error; } @@ -650,6 +671,7 @@ void netfs_readahead(struct readahead_control *ractl, rreq->start = readahead_pos(ractl); rreq->len = readahead_length(ractl); + netfs_stat(&netfs_n_rh_readahead); trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), netfs_read_trace_readahead); @@ -716,6 +738,7 @@ int netfs_readpage(struct file *file, rreq->start = page->index * PAGE_SIZE; rreq->len = thp_size(page); + netfs_stat(&netfs_n_rh_readpage); trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); netfs_get_read_request(rreq); diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c new file mode 100644 index 000000000000..df6ff5718f25 --- /dev/null +++ b/fs/netfs/stats.c @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* Netfs support statistics + * + * Copyright (C) 2021 Red Hat, Inc. All Rights Reserved. + * Written by David Howells (dhowells@redhat.com) + */ + +#include +#include +#include +#include "internal.h" + +atomic_t netfs_n_rh_readahead; +atomic_t netfs_n_rh_readpage; +atomic_t netfs_n_rh_rreq; +atomic_t netfs_n_rh_sreq; +atomic_t netfs_n_rh_download; +atomic_t netfs_n_rh_download_done; +atomic_t netfs_n_rh_download_failed; +atomic_t netfs_n_rh_download_instead; +atomic_t netfs_n_rh_read; +atomic_t netfs_n_rh_read_done; +atomic_t netfs_n_rh_read_failed; +atomic_t netfs_n_rh_zero; +atomic_t netfs_n_rh_short_read; +atomic_t netfs_n_rh_write; +atomic_t netfs_n_rh_write_done; +atomic_t netfs_n_rh_write_failed; + +void netfs_stats_show(struct seq_file *m) +{ + seq_printf(m, "RdHelp : RA=%u RP=%u rr=%u sr=%u\n", + atomic_read(&netfs_n_rh_readahead), + atomic_read(&netfs_n_rh_readpage), + atomic_read(&netfs_n_rh_rreq), + atomic_read(&netfs_n_rh_sreq)); + seq_printf(m, "RdHelp : ZR=%u sh=%u\n", + atomic_read(&netfs_n_rh_zero), + atomic_read(&netfs_n_rh_short_read)); + seq_printf(m, "RdHelp : DL=%u ds=%u df=%u di=%u\n", + atomic_read(&netfs_n_rh_download), + atomic_read(&netfs_n_rh_download_done), + atomic_read(&netfs_n_rh_download_failed), + atomic_read(&netfs_n_rh_download_instead)); + seq_printf(m, "RdHelp : RD=%u rs=%u rf=%u\n", + atomic_read(&netfs_n_rh_read), + atomic_read(&netfs_n_rh_read_done), + atomic_read(&netfs_n_rh_read_failed)); + seq_printf(m, "RdHelp : WR=%u ws=%u wf=%u\n", + atomic_read(&netfs_n_rh_write), + atomic_read(&netfs_n_rh_write_done), + atomic_read(&netfs_n_rh_write_failed)); +} +EXPORT_SYMBOL(netfs_stats_show); diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 156a19f47213..f2a0c2dc425b 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -138,5 +138,6 @@ extern int netfs_readpage(struct file *, void *); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); +extern void netfs_stats_show(struct seq_file *); #endif /* _LINUX_NETFS_H */ From patchwork Wed Mar 10 16:56:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128563 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7F51C433DB for ; Wed, 10 Mar 2021 16:57:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4DA4064FC5 for ; Wed, 10 Mar 2021 16:57:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4DA4064FC5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E2E8B8D01DA; Wed, 10 Mar 2021 11:57:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E056E8D01D5; Wed, 10 Mar 2021 11:57:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C7EFE8D01DA; Wed, 10 Mar 2021 11:57:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0208.hostedemail.com [216.40.44.208]) by kanga.kvack.org (Postfix) with ESMTP id AB1CA8D01D5 for ; Wed, 10 Mar 2021 11:57:11 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 68329181C5540 for ; Wed, 10 Mar 2021 16:57:11 +0000 (UTC) X-FDA: 77904569862.21.6055543 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 0F984C0007D0 for ; Wed, 10 Mar 2021 16:57:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395429; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Anevj5TH4/GDh/44apd4qtX1bFuDOsvHHbarWQ52AHc=; b=iYTQcHyjKNCEUHJRQMRHJTz93M7aPxj0A8N6mUU22zmHkT08bshR7ZJifqTy4PGy9tasiW qI5IftvDjqIxEhqHJMqMwlCtmzJ15/htpYM3S1c8j5Tqe8TR8P9gVcuDcFYasKazQXyUwc rABPuKJ85T7PCvIlvlQd29mg8QpCZG4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-398-CX8fvXziMZWFg5MnErcmKg-1; Wed, 10 Mar 2021 11:57:05 -0500 X-MC-Unique: CX8fvXziMZWFg5MnErcmKg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 7C26DE5762; Wed, 10 Mar 2021 16:57:03 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 295F41F076; Wed, 10 Mar 2021 16:56:56 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 11/28] netfs: Add write_begin helper From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:56:55 +0000 Message-ID: <161539541541.286939.1889738674057013729.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Stat-Signature: c5x44h8tn5cbbermq3dgz53t1rdn7o5n X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 0F984C0007D0 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf14; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395425-712690 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a helper to do the pre-reading work for the netfs write_begin address space op. Changes - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/160588543960.3465195.2792938973035886168.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161118140165.1232039.16418853874312234477.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161035539.2537118.15674887534950908530.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340398368.1303470.11242918276563276090.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] --- fs/netfs/internal.h | 2 + fs/netfs/read_helper.c | 165 ++++++++++++++++++++++++++++++++++++++++++ fs/netfs/stats.c | 11 ++- include/linux/netfs.h | 8 ++ include/trace/events/netfs.h | 4 + 5 files changed, 186 insertions(+), 4 deletions(-) diff --git a/fs/netfs/internal.h b/fs/netfs/internal.h index 98b6f4516da1..b7f2c4459f33 100644 --- a/fs/netfs/internal.h +++ b/fs/netfs/internal.h @@ -34,8 +34,10 @@ extern atomic_t netfs_n_rh_read_failed; extern atomic_t netfs_n_rh_zero; extern atomic_t netfs_n_rh_short_read; extern atomic_t netfs_n_rh_write; +extern atomic_t netfs_n_rh_write_begin; extern atomic_t netfs_n_rh_write_done; extern atomic_t netfs_n_rh_write_failed; +extern atomic_t netfs_n_rh_write_zskip; static inline void netfs_stat(atomic_t *stat) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 8b03fcc2f8f1..ce3a2f5844d1 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -766,3 +766,168 @@ int netfs_readpage(struct file *file, return ret; } EXPORT_SYMBOL(netfs_readpage); + +static void netfs_clear_thp(struct page *page) +{ + unsigned int i; + + for (i = 0; i < thp_nr_pages(page); i++) + clear_highpage(page + i); +} + +/** + * netfs_write_begin - Helper to prepare for writing + * @file: The file to read from + * @mapping: The mapping to read from + * @pos: File position at which the write will begin + * @len: The length of the write in this page + * @flags: AOP_* flags + * @_page: Where to put the resultant page + * @_fsdata: Place for the netfs to store a cookie + * @ops: The network filesystem's operations for the helper to use + * @netfs_priv: Private netfs data to be retained in the request + * + * Pre-read data for a write-begin request by drawing data from the cache if + * possible, or the netfs if not. Space beyond the EOF is zero-filled. + * Multiple I/O requests from different sources will get munged together. If + * necessary, the readahead window can be expanded in either direction to a + * more convenient alighment for RPC efficiency or to make storage in the cache + * feasible. + * + * The calling netfs must provide a table of operations, only one of which, + * issue_op, is mandatory. + * + * The check_write_begin() operation can be provided to check for and flush + * conflicting writes once the page is grabbed and locked. It is passed a + * pointer to the fsdata cookie that gets returned to the VM to be passed to + * write_end. It is permitted to sleep. It should return 0 if the request + * should go ahead; unlock the page and return -EAGAIN to cause the page to be + * regot; or return an error. + * + * This is usable whether or not caching is enabled. + */ +int netfs_write_begin(struct file *file, struct address_space *mapping, + loff_t pos, unsigned int len, unsigned int flags, + struct page **_page, void **_fsdata, + const struct netfs_read_request_ops *ops, + void *netfs_priv) +{ + struct netfs_read_request *rreq; + struct page *page, *xpage; + struct inode *inode = file_inode(file); + unsigned int debug_index = 0; + pgoff_t index = pos >> PAGE_SHIFT; + int pos_in_page = pos & ~PAGE_MASK; + loff_t size; + int ret; + + struct readahead_control ractl = { + .file = file, + .mapping = mapping, + ._index = index, + ._nr_pages = 0, + }; + +retry: + page = grab_cache_page_write_begin(mapping, index, 0); + if (!page) + return -ENOMEM; + + if (ops->check_write_begin) { + /* Allow the netfs (eg. ceph) to flush conflicts. */ + ret = ops->check_write_begin(file, pos, len, page, _fsdata); + if (ret < 0) { + if (ret == -EAGAIN) + goto retry; + goto error; + } + } + + if (PageUptodate(page)) + goto have_page; + + /* If the page is beyond the EOF, we want to clear it - unless it's + * within the cache granule containing the EOF, in which case we need + * to preload the granule. + */ + size = i_size_read(inode); + if (!ops->is_cache_enabled(inode) && + ((pos_in_page == 0 && len == thp_size(page)) || + (pos >= size) || + (pos_in_page == 0 && (pos + len) >= size))) { + netfs_clear_thp(page); + SetPageUptodate(page); + netfs_stat(&netfs_n_rh_write_zskip); + goto have_page_no_wait; + } + + ret = -ENOMEM; + rreq = netfs_alloc_read_request(ops, netfs_priv, file); + if (!rreq) + goto error; + rreq->mapping = page->mapping; + rreq->start = page->index * PAGE_SIZE; + rreq->len = thp_size(page); + rreq->no_unlock_page = page->index; + __set_bit(NETFS_RREQ_NO_UNLOCK_PAGE, &rreq->flags); + netfs_priv = NULL; + + netfs_stat(&netfs_n_rh_write_begin); + trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin); + + /* Expand the request to meet caching requirements and download + * preferences. + */ + ractl._nr_pages = thp_nr_pages(page); + netfs_rreq_expand(rreq, &ractl); + netfs_get_read_request(rreq); + + /* We hold the page locks, so we can drop the references */ + while ((xpage = readahead_page(&ractl))) + if (xpage != page) + put_page(xpage); + + atomic_set(&rreq->nr_rd_ops, 1); + do { + if (!netfs_rreq_submit_slice(rreq, &debug_index)) + break; + + } while (rreq->submitted < rreq->len); + + /* Keep nr_rd_ops incremented so that the ref always belongs to us, and + * the service code isn't punted off to a random thread pool to + * process. + */ + for (;;) { + wait_var_event(&rreq->nr_rd_ops, atomic_read(&rreq->nr_rd_ops) == 1); + netfs_rreq_assess(rreq, false); + if (!test_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags)) + break; + cond_resched(); + } + + ret = rreq->error; + if (ret == 0 && rreq->submitted < rreq->len) + ret = -EIO; + netfs_put_read_request(rreq, false); + if (ret < 0) + goto error; + +have_page: + wait_on_page_fscache(page); +have_page_no_wait: + if (netfs_priv) + ops->cleanup(netfs_priv, mapping); + *_page = page; + _leave(" = 0"); + return 0; + +error: + unlock_page(page); + put_page(page); + if (netfs_priv) + ops->cleanup(netfs_priv, mapping); + _leave(" = %d", ret); + return ret; +} +EXPORT_SYMBOL(netfs_write_begin); diff --git a/fs/netfs/stats.c b/fs/netfs/stats.c index df6ff5718f25..9ae538c85378 100644 --- a/fs/netfs/stats.c +++ b/fs/netfs/stats.c @@ -24,19 +24,24 @@ atomic_t netfs_n_rh_read_failed; atomic_t netfs_n_rh_zero; atomic_t netfs_n_rh_short_read; atomic_t netfs_n_rh_write; +atomic_t netfs_n_rh_write_begin; atomic_t netfs_n_rh_write_done; atomic_t netfs_n_rh_write_failed; +atomic_t netfs_n_rh_write_zskip; void netfs_stats_show(struct seq_file *m) { - seq_printf(m, "RdHelp : RA=%u RP=%u rr=%u sr=%u\n", + seq_printf(m, "RdHelp : RA=%u RP=%u WB=%u WBZ=%u rr=%u sr=%u\n", atomic_read(&netfs_n_rh_readahead), atomic_read(&netfs_n_rh_readpage), + atomic_read(&netfs_n_rh_write_begin), + atomic_read(&netfs_n_rh_write_zskip), atomic_read(&netfs_n_rh_rreq), atomic_read(&netfs_n_rh_sreq)); - seq_printf(m, "RdHelp : ZR=%u sh=%u\n", + seq_printf(m, "RdHelp : ZR=%u sh=%u sk=%u\n", atomic_read(&netfs_n_rh_zero), - atomic_read(&netfs_n_rh_short_read)); + atomic_read(&netfs_n_rh_short_read), + atomic_read(&netfs_n_rh_write_zskip)); seq_printf(m, "RdHelp : DL=%u ds=%u df=%u di=%u\n", atomic_read(&netfs_n_rh_download), atomic_read(&netfs_n_rh_download_done), diff --git a/include/linux/netfs.h b/include/linux/netfs.h index f2a0c2dc425b..9e1a1c48a45a 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -119,11 +119,14 @@ struct netfs_read_request { * Operations the network filesystem can/must provide to the helpers. */ struct netfs_read_request_ops { + bool (*is_cache_enabled)(struct inode *inode); void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); void (*expand_readahead)(struct netfs_read_request *rreq); bool (*clamp_length)(struct netfs_read_subrequest *subreq); void (*issue_op)(struct netfs_read_subrequest *subreq); bool (*is_still_valid)(struct netfs_read_request *rreq); + int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, + struct page *page, void **_fsdata); void (*done)(struct netfs_read_request *rreq); void (*cleanup)(struct address_space *mapping, void *netfs_priv); }; @@ -136,6 +139,11 @@ extern int netfs_readpage(struct file *, struct page *, const struct netfs_read_request_ops *, void *); +extern int netfs_write_begin(struct file *, struct address_space *, + loff_t, unsigned int, unsigned int, struct page **, + void **, + const struct netfs_read_request_ops *, + void *); extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); extern void netfs_stats_show(struct seq_file *); diff --git a/include/trace/events/netfs.h b/include/trace/events/netfs.h index 12ad382764c5..a2bf6cd84bd4 100644 --- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -22,6 +22,7 @@ enum netfs_read_trace { netfs_read_trace_expanded, netfs_read_trace_readahead, netfs_read_trace_readpage, + netfs_read_trace_write_begin, }; enum netfs_rreq_trace { @@ -50,7 +51,8 @@ enum netfs_sreq_trace { #define netfs_read_traces \ EM(netfs_read_trace_expanded, "EXPANDED ") \ EM(netfs_read_trace_readahead, "READAHEAD") \ - E_(netfs_read_trace_readpage, "READPAGE ") + EM(netfs_read_trace_readpage, "READPAGE ") \ + E_(netfs_read_trace_write_begin, "WRITEBEGN") #define netfs_rreq_traces \ EM(netfs_rreq_trace_assess, "ASSESS") \ From patchwork Wed Mar 10 16:57:08 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128753 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CCD8C4360C for ; Wed, 10 Mar 2021 16:57:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 80D1964FC5 for ; Wed, 10 Mar 2021 16:57:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 80D1964FC5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 135278D01DB; Wed, 10 Mar 2021 11:57:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 10D0D8D01D5; Wed, 10 Mar 2021 11:57:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EC8808D01DB; Wed, 10 Mar 2021 11:57:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0062.hostedemail.com [216.40.44.62]) by kanga.kvack.org (Postfix) with ESMTP id CB4FF8D01D5 for ; Wed, 10 Mar 2021 11:57:33 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 89A8D82499A8 for ; Wed, 10 Mar 2021 16:57:33 +0000 (UTC) X-FDA: 77904570786.06.9B2EB78 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf22.hostedemail.com (Postfix) with ESMTP id CED88C0007D4 for ; Wed, 10 Mar 2021 16:57:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395446; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=27pFUuTN4WLlbSd2h0Sj0hHiIC29hLO1F3AEYyoAjRk=; b=OaoNo6yyaLZCUKqIHnmL83Es3YJEGzDXAyd5W0F8syL8kr4bC5JSUQ2x8st0vSa2Qjnsz5 6fF0bqfzhYS5LGuehnLmOr4KFwHJIE3yISICy7DyKP3dNPFFsmA2xlDH5O5jjGMLecrP4Y HSk1+v/qux4hbJC9Kl8EGKLs4ZKZ3us= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-349-_UQYMPkhOpux-uY6fjCmWQ-1; Wed, 10 Mar 2021 11:57:22 -0500 X-MC-Unique: _UQYMPkhOpux-uY6fjCmWQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 0FCB6E5777; Wed, 10 Mar 2021 16:57:19 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id ACB956D024; Wed, 10 Mar 2021 16:57:09 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 12/28] netfs: Define an interface to talk to a cache From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Jeff Layton , Matthew Wilcox , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:57:08 +0000 Message-ID: <161539542874.286939.13337898213448136687.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: CED88C0007D4 X-Stat-Signature: ou91xxuk9m3e85cgy7pttc7zfouimmyu Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf22; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395445-568510 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add an interface to the netfs helper library for reading data from the cache instead of downloading it from the server and support for writing data just downloaded or cleared to the cache. The API passes an iov_iter to the cache read/write routines to indicate the data/buffer to be used. This is done using the ITER_XARRAY type to provide direct access to the netfs inode's pagecache. When the netfs's ->begin_cache_operation() method is called, this must fill in the cache_resources in the netfs_read_request struct, including the netfs_cache_ops used by the helper lib to talk to the cache. The helper lib does not directly access the cache. Changes - Added flag to netfs_subreq_terminated() to indicate that the caller may have been running async and stuff that might sleep needs punting to a workqueue (can't use in_softirq()[1]). - Add missing inc of netfs_n_rh_read stat. - Move initial definition of fscache_begin_read_operation() elsewhere. - Need to call op->begin_cache_operation() from netfs_write_begin(). Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: Matthew Wilcox cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/161118141321.1232039.8296910406755622458.stgit@warthog.procyon.org.uk/ # rfc Link: https://lore.kernel.org/r/161161036700.2537118.11170748455436854978.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/161340399569.1303470.1138884774643385730.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/20210216084230.GA23669@lst.de/ [1] --- fs/netfs/read_helper.c | 241 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/netfs.h | 49 ++++++++++ 2 files changed, 289 insertions(+), 1 deletion(-) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index ce3a2f5844d1..0a7b76c11c98 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -88,6 +88,8 @@ static void netfs_free_read_request(struct work_struct *work) if (rreq->netfs_priv) rreq->netfs_ops->cleanup(rreq->mapping, rreq->netfs_priv); trace_netfs_rreq(rreq, netfs_rreq_trace_free); + if (rreq->cache_resources.ops) + rreq->cache_resources.ops->end_operation(&rreq->cache_resources); kfree(rreq); netfs_stat_d(&netfs_n_rh_rreq); } @@ -154,6 +156,34 @@ static void netfs_clear_unread(struct netfs_read_subrequest *subreq) iov_iter_zero(iov_iter_count(&iter), &iter); } +static void netfs_cache_read_terminated(void *priv, ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_read_subrequest *subreq = priv; + + netfs_subreq_terminated(subreq, transferred_or_error, was_async); +} + +/* + * Issue a read against the cache. + * - Eats the caller's ref on subreq. + */ +static void netfs_read_from_cache(struct netfs_read_request *rreq, + struct netfs_read_subrequest *subreq, + bool seek_data) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + struct iov_iter iter; + + netfs_stat(&netfs_n_rh_read); + iov_iter_xarray(&iter, READ, &rreq->mapping->i_pages, + subreq->start + subreq->transferred, + subreq->len - subreq->transferred); + + cres->ops->read(cres, subreq->start, &iter, seek_data, + netfs_cache_read_terminated, subreq); +} + /* * Fill a subrequest region with zeroes. */ @@ -198,6 +228,144 @@ static void netfs_rreq_completed(struct netfs_read_request *rreq, bool was_async netfs_put_read_request(rreq, was_async); } +/* + * Deal with the completion of writing the data to the cache. We have to clear + * the PG_fscache bits on the pages involved and release the caller's ref. + * + * May be called in softirq mode and we inherit a ref from the caller. + */ +static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq, + bool was_async) +{ + struct netfs_read_subrequest *subreq; + struct page *page; + pgoff_t unlocked = 0; + bool have_unlocked = false; + + rcu_read_lock(); + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE); + + xas_for_each(&xas, page, (subreq->start + subreq->len - 1) / PAGE_SIZE) { + /* We might have multiple writes from the same huge + * page, but we mustn't unlock a page more than once. + */ + if (have_unlocked && page->index <= unlocked) + continue; + unlocked = page->index; + unlock_page_fscache(page); + have_unlocked = true; + } + } + + rcu_read_unlock(); + netfs_rreq_completed(rreq, was_async); +} + +static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error, + bool was_async) +{ + struct netfs_read_subrequest *subreq = priv; + struct netfs_read_request *rreq = subreq->rreq; + + if (IS_ERR_VALUE(transferred_or_error)) { + subreq->error = transferred_or_error; + netfs_stat(&netfs_n_rh_write_failed); + } else { + subreq->error = 0; + netfs_stat(&netfs_n_rh_write_done); + } + + trace_netfs_sreq(subreq, netfs_sreq_trace_write_term); + + /* If we decrement nr_wr_ops to 0, the ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_wr_ops)) + netfs_rreq_unmark_after_write(rreq, was_async); + + netfs_put_subrequest(subreq, was_async); +} + +/* + * Perform any outstanding writes to the cache. We inherit a ref from the + * caller. + */ +static void netfs_rreq_do_write_to_cache(struct netfs_read_request *rreq) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + struct netfs_read_subrequest *subreq, *next, *p; + struct iov_iter iter; + loff_t pos; + + trace_netfs_rreq(rreq, netfs_rreq_trace_write); + + /* We don't want terminating writes trying to wake us up whilst we're + * still going through the list. + */ + atomic_inc(&rreq->nr_wr_ops); + + list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { + if (!test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { + list_del_init(&subreq->rreq_link); + netfs_put_subrequest(subreq, false); + } + } + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + /* Amalgamate adjacent writes */ + pos = round_down(subreq->start, PAGE_SIZE); + if (pos != subreq->start) { + subreq->len += subreq->start - pos; + subreq->start = pos; + } + subreq->len = round_up(subreq->len, PAGE_SIZE); + + while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + next = list_next_entry(subreq, rreq_link); + if (next->start > subreq->start + subreq->len) + break; + subreq->len += next->len; + subreq->len = round_up(subreq->len, PAGE_SIZE); + list_del_init(&next->rreq_link); + netfs_put_subrequest(next, false); + } + + iov_iter_xarray(&iter, WRITE, &rreq->mapping->i_pages, + subreq->start, subreq->len); + + atomic_inc(&rreq->nr_wr_ops); + netfs_stat(&netfs_n_rh_write); + netfs_get_read_subrequest(subreq); + trace_netfs_sreq(subreq, netfs_sreq_trace_write); + cres->ops->write(cres, subreq->start, &iter, + netfs_rreq_copy_terminated, subreq); + } + + /* If we decrement nr_wr_ops to 0, the usage ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_wr_ops)) + netfs_rreq_unmark_after_write(rreq, false); +} + +static void netfs_rreq_write_to_cache_work(struct work_struct *work) +{ + struct netfs_read_request *rreq = + container_of(work, struct netfs_read_request, work); + + netfs_rreq_do_write_to_cache(rreq); +} + +static void netfs_rreq_write_to_cache(struct netfs_read_request *rreq, + bool was_async) +{ + if (was_async) { + rreq->work.func = netfs_rreq_write_to_cache_work; + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); + } else { + netfs_rreq_do_write_to_cache(rreq); + } +} + /* * Unlock the pages in a read operation. We need to set PG_fscache on any * pages we're going to write back before we unlock them. @@ -299,7 +467,10 @@ static void netfs_rreq_short_read(struct netfs_read_request *rreq, netfs_get_read_subrequest(subreq); atomic_inc(&rreq->nr_rd_ops); - netfs_read_from_server(rreq, subreq); + if (subreq->source == NETFS_READ_FROM_CACHE) + netfs_read_from_cache(rreq, subreq, true); + else + netfs_read_from_server(rreq, subreq); } /* @@ -344,6 +515,25 @@ static bool netfs_rreq_perform_resubmissions(struct netfs_read_request *rreq) return false; } +/* + * Check to see if the data read is still valid. + */ +static void netfs_rreq_is_still_valid(struct netfs_read_request *rreq) +{ + struct netfs_read_subrequest *subreq; + + if (!rreq->netfs_ops->is_still_valid || + rreq->netfs_ops->is_still_valid(rreq)) + return; + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + if (subreq->source == NETFS_READ_FROM_CACHE) { + subreq->error = -ESTALE; + __set_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags); + } + } +} + /* * Assess the state of a read request and decide what to do next. * @@ -355,6 +545,8 @@ static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async) trace_netfs_rreq(rreq, netfs_rreq_trace_assess); again: + netfs_rreq_is_still_valid(rreq); + if (!test_bit(NETFS_RREQ_FAILED, &rreq->flags) && test_bit(NETFS_RREQ_INCOMPLETE_IO, &rreq->flags)) { if (netfs_rreq_perform_resubmissions(rreq)) @@ -367,6 +559,9 @@ static void netfs_rreq_assess(struct netfs_read_request *rreq, bool was_async) clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS); + if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags)) + return netfs_rreq_write_to_cache(rreq, was_async); + netfs_rreq_completed(rreq, was_async); } @@ -501,7 +696,10 @@ static enum netfs_read_source netfs_cache_prepare_read(struct netfs_read_subrequ loff_t i_size) { struct netfs_read_request *rreq = subreq->rreq; + struct netfs_cache_resources *cres = &rreq->cache_resources; + if (cres->ops) + return cres->ops->prepare_read(subreq, i_size); if (subreq->start >= rreq->i_size) return NETFS_FILL_WITH_ZEROES; return NETFS_DOWNLOAD_FROM_SERVER; @@ -592,6 +790,9 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, case NETFS_DOWNLOAD_FROM_SERVER: netfs_read_from_server(rreq, subreq); break; + case NETFS_READ_FROM_CACHE: + netfs_read_from_cache(rreq, subreq, false); + break; default: BUG(); } @@ -604,9 +805,23 @@ static bool netfs_rreq_submit_slice(struct netfs_read_request *rreq, return false; } +static void netfs_cache_expand_readahead(struct netfs_read_request *rreq, + loff_t *_start, size_t *_len, loff_t i_size) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + + if (cres->ops && cres->ops->expand_readahead) + cres->ops->expand_readahead(cres, _start, _len, i_size); +} + static void netfs_rreq_expand(struct netfs_read_request *rreq, struct readahead_control *ractl) { + /* Give the cache a chance to change the request parameters. The + * resultant request must contain the original region. + */ + netfs_cache_expand_readahead(rreq, &rreq->start, &rreq->len, rreq->i_size); + /* Give the netfs a chance to change the request parameters. The * resultant request must contain the original region. */ @@ -658,6 +873,7 @@ void netfs_readahead(struct readahead_control *ractl, struct netfs_read_request *rreq; struct page *page; unsigned int debug_index = 0; + int ret; _enter("%lx,%x", readahead_index(ractl), readahead_count(ractl)); @@ -675,6 +891,11 @@ void netfs_readahead(struct readahead_control *ractl, trace_netfs_read(rreq, readahead_pos(ractl), readahead_length(ractl), netfs_read_trace_readahead); + if (ops->begin_cache_operation) { + ret = ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto cleanup_free; + } netfs_rreq_expand(rreq, ractl); atomic_set(&rreq->nr_rd_ops, 1); @@ -692,6 +913,9 @@ void netfs_readahead(struct readahead_control *ractl, netfs_rreq_assess(rreq, false); return; +cleanup_free: + netfs_put_read_request(rreq, false); + return; cleanup: if (netfs_priv) ops->cleanup(ractl->mapping, netfs_priv); @@ -741,6 +965,14 @@ int netfs_readpage(struct file *file, netfs_stat(&netfs_n_rh_readpage); trace_netfs_read(rreq, rreq->start, rreq->len, netfs_read_trace_readpage); + if (ops->begin_cache_operation) { + ret = ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) { + unlock_page(page); + goto out; + } + } + netfs_get_read_request(rreq); atomic_set(&rreq->nr_rd_ops, 1); @@ -762,6 +994,7 @@ int netfs_readpage(struct file *file, ret = rreq->error; if (ret == 0 && rreq->submitted < rreq->len) ret = -EIO; +out: netfs_put_read_request(rreq, false); return ret; } @@ -875,6 +1108,12 @@ int netfs_write_begin(struct file *file, struct address_space *mapping, netfs_stat(&netfs_n_rh_write_begin); trace_netfs_read(rreq, pos, len, netfs_read_trace_write_begin); + if (ops->begin_cache_operation) { + ret = ops->begin_cache_operation(rreq); + if (ret == -ENOMEM || ret == -EINTR || ret == -ERESTARTSYS) + goto error; + } + /* Expand the request to meet caching requirements and download * preferences. */ diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 9e1a1c48a45a..9d3fbed4e30a 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -64,6 +64,18 @@ enum netfs_read_source { NETFS_INVALID_READ, } __mode(byte); +typedef void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error, + bool was_async); + +/* + * Resources required to do operations on a cache. + */ +struct netfs_cache_resources { + const struct netfs_cache_ops *ops; + void *cache_priv; + void *cache_priv2; +}; + /* * Descriptor for a single component subrequest. */ @@ -93,11 +105,13 @@ struct netfs_read_request { struct work_struct work; struct inode *inode; /* The file being accessed */ struct address_space *mapping; /* The mapping being accessed */ + struct netfs_cache_resources cache_resources; struct list_head subrequests; /* Requests to fetch I/O from disk or net */ void *netfs_priv; /* Private data for the netfs */ unsigned int debug_id; unsigned int cookie_debug_id; atomic_t nr_rd_ops; /* Number of read ops in progress */ + atomic_t nr_wr_ops; /* Number of write ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ short error; /* 0 or error that occurred */ @@ -121,6 +135,7 @@ struct netfs_read_request { struct netfs_read_request_ops { bool (*is_cache_enabled)(struct inode *inode); void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); + int (*begin_cache_operation)(struct netfs_read_request *rreq); void (*expand_readahead)(struct netfs_read_request *rreq); bool (*clamp_length)(struct netfs_read_subrequest *subreq); void (*issue_op)(struct netfs_read_subrequest *subreq); @@ -131,6 +146,40 @@ struct netfs_read_request_ops { void (*cleanup)(struct address_space *mapping, void *netfs_priv); }; +/* + * Table of operations for access to a cache. This is obtained by + * rreq->ops->begin_cache_operation(). + */ +struct netfs_cache_ops { + /* End an operation */ + void (*end_operation)(struct netfs_cache_resources *cres); + + /* Read data from the cache */ + int (*read)(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + bool seek_data, + netfs_io_terminated_t term_func, + void *term_func_priv); + + /* Write data to the cache */ + int (*write)(struct netfs_cache_resources *cres, + loff_t start_pos, + struct iov_iter *iter, + netfs_io_terminated_t term_func, + void *term_func_priv); + + /* Expand readahead request */ + void (*expand_readahead)(struct netfs_cache_resources *cres, + loff_t *_start, size_t *_len, loff_t i_size); + + /* Prepare a read operation, shortening it to a cached/uncached + * boundary as appropriate. + */ + enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq, + loff_t i_size); +}; + struct readahead_control; extern void netfs_readahead(struct readahead_control *, const struct netfs_read_request_ops *, From patchwork Wed Mar 10 16:57:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 12128755 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E91E2C4332B for ; Wed, 10 Mar 2021 16:58:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8723064FC9 for ; Wed, 10 Mar 2021 16:58:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8723064FC9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 295338D01DD; Wed, 10 Mar 2021 11:58:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 26B2B8D01D5; Wed, 10 Mar 2021 11:58:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 10C788D01DD; Wed, 10 Mar 2021 11:58:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21]) by kanga.kvack.org (Postfix) with ESMTP id E91418D01D5 for ; Wed, 10 Mar 2021 11:58:02 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 866018400 for ; Wed, 10 Mar 2021 16:58:02 +0000 (UTC) X-FDA: 77904572004.27.BD3E02F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf07.hostedemail.com (Postfix) with ESMTP id EAD49A001A8A for ; Wed, 10 Mar 2021 16:57:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615395458; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tJ1em54X7cNlxYOwt1XkWsi2bN3JlmKyhBUdFCYW68w=; b=LEUcDEzDTOtvzmPyly5rYFc50+gUJ4NZ27DOEJmHYGvlm6J4V3WbY9AvjXFdbMoScmh5K+ F2nTBEKoSaZtXS186KnFeuRYzSs/JA+P+YqcvW6y/7qM/X97u6o5rvJgizrjSMuQ7ti5Di FVDZbadq57uIVDvDeRVSq2+DBJKoI9k= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-312-lhDkcfi-NXOnfsBkFqHPVA-1; Wed, 10 Mar 2021 11:57:36 -0500 X-MC-Unique: lhDkcfi-NXOnfsBkFqHPVA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E7161108BD07; Wed, 10 Mar 2021 16:57:33 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-118-152.rdu2.redhat.com [10.10.118.152]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3C3C460C13; Wed, 10 Mar 2021 16:57:24 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 Subject: [PATCH v4 13/28] netfs: Hold a ref on a page when PG_private_2 is set From: David Howells To: Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: Linus Torvalds , Matthew Wilcox , Linus Torvalds , linux-mm@kvack.org, linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, dhowells@redhat.com, Jeff Layton , David Wysochanski , "Matthew Wilcox (Oracle)" , Alexander Viro , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Wed, 10 Mar 2021 16:57:24 +0000 Message-ID: <161539544426.286939.4484560053278261236.stgit@warthog.procyon.org.uk> In-Reply-To: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> References: <161539526152.286939.8589700175877370401.stgit@warthog.procyon.org.uk> User-Agent: StGit/0.23 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: EAD49A001A8A X-Stat-Signature: t7khkigntdwgfqwwyr7agchp8b6zgbtr Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615395458-836250 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Take a reference on a page when PG_private_2 is set and drop it once the bit is unlocked[1]. Reported-by: Linus Torvalds Signed-off-by: David Howells cc: Matthew Wilcox cc: Linus Torvalds cc: linux-mm@kvack.org cc: linux-cachefs@redhat.com cc: linux-afs@lists.infradead.org cc: linux-nfs@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: ceph-devel@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/CAHk-=wh+2gbF7XEjYc=HV9w_2uVzVf7vs60BPz0gFA=+pUm3ww@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/1331025.1612974993@warthog.procyon.org.uk/ Link: https://lore.kernel.org/r/161340400833.1303470.8070750156044982282.stgit@warthog.procyon.org.uk/ # v3 --- fs/netfs/read_helper.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/netfs/read_helper.c b/fs/netfs/read_helper.c index 0a7b76c11c98..ce11ca4c32e4 100644 --- a/fs/netfs/read_helper.c +++ b/fs/netfs/read_helper.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -238,10 +239,13 @@ static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq, bool was_async) { struct netfs_read_subrequest *subreq; + struct pagevec pvec; struct page *page; pgoff_t unlocked = 0; bool have_unlocked = false; + pagevec_init(&pvec); + rcu_read_lock(); list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { @@ -255,6 +259,8 @@ static void netfs_rreq_unmark_after_write(struct netfs_read_request *rreq, continue; unlocked = page->index; unlock_page_fscache(page); + if (pagevec_add(&pvec, page) == 0) + pagevec_release(&pvec); have_unlocked = true; } } @@ -413,8 +419,10 @@ static void netfs_rreq_unlock(struct netfs_read_request *rreq) pg_failed = true; break; } - if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) + if (test_bit(NETFS_SREQ_WRITE_TO_CACHE, &subreq->flags)) { + get_page(page); SetPageFsCache(page); + } pg_failed |= subreq_failed; if (pgend < iopos + subreq->len) break;