From patchwork Wed Dec 1 19:37:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Catalin Marinas X-Patchwork-Id: 12694387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CE56CC433EF for ; Wed, 1 Dec 2021 19:39:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=mydsJ2B0BxsujsUpyeMwouPWkWnAq4a9it7yJKVhV3I=; b=x+G5YgBi7GjEyy mkx63iG2gFksLUEdf24lowb8FnWhZBlsW8wpaAGcEUvmqKiGAsv4v93cUNT+F6AoHGJ8/vhbuFRg8 jY04ALIeqlpLxt8UrtP4Bz8KRUfr4tPttsqMj5mehYzS4w4WfAg7KiPZonCZ7Xz86X07ZJ8wZTuiZ zQu4Vs7JhPuHKvRMiLsE7LhJmooe1MZUfTCwoBv66AHgd+U7Xn0GEoIbqeeQeN7lZmdG4aZiZ6TwA RN2BkgHIg6d/UXJ0kgXpkDnBHHt7J+JTdLivmPMbzenFQIdS/bAioXpwGrrZmDxgJXwvQ8cZ/TpeI ALvukW7IuWIdRYMLLQ8g==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1msVQl-009xno-Fy; Wed, 01 Dec 2021 19:38:15 +0000 Received: from ams.source.kernel.org ([2604:1380:4601:e00::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1msVQW-009xiX-BT for linux-arm-kernel@lists.infradead.org; Wed, 01 Dec 2021 19:38:03 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 920E4B8211C; Wed, 1 Dec 2021 19:37:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 21F43C53FCF; Wed, 1 Dec 2021 19:37:54 +0000 (UTC) From: Catalin Marinas To: Linus Torvalds , Andreas Gruenbacher Cc: Josef Bacik , David Sterba , Al Viro , Andrew Morton , Will Deacon , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-btrfs@vger.kernel.org Subject: [PATCH v2 1/4] mm: Introduce a 'min_size' argument to fault_in_*() Date: Wed, 1 Dec 2021 19:37:47 +0000 Message-Id: <20211201193750.2097885-2-catalin.marinas@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211201193750.2097885-1-catalin.marinas@arm.com> References: <20211201193750.2097885-1-catalin.marinas@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211201_113800_717378_336F0BAC X-CRM114-Status: GOOD ( 24.20 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org There is no functional change after this patch as all callers pass a min_size of 0. This argument will be used in subsequent patches to probe for faults at sub-page granularity (e.g. arm64 MTE and SPARC ADI). With a non-zero 'min_size' argument, the fault_in_*() functions return the full range if they don't manage to fault in the minimum size. Signed-off-by: Catalin Marinas --- arch/powerpc/kernel/kvm.c | 2 +- arch/powerpc/kernel/signal_32.c | 4 ++-- arch/powerpc/kernel/signal_64.c | 2 +- arch/x86/kernel/fpu/signal.c | 2 +- drivers/gpu/drm/armada/armada_gem.c | 2 +- fs/btrfs/file.c | 6 ++--- fs/btrfs/ioctl.c | 2 +- fs/f2fs/file.c | 2 +- fs/fuse/file.c | 2 +- fs/gfs2/file.c | 8 +++---- fs/iomap/buffered-io.c | 2 +- fs/ntfs/file.c | 2 +- fs/ntfs3/file.c | 2 +- include/linux/pagemap.h | 8 ++++--- include/linux/uio.h | 6 +++-- lib/iov_iter.c | 28 +++++++++++++++++++----- mm/filemap.c | 2 +- mm/gup.c | 34 ++++++++++++++++++++--------- 18 files changed, 75 insertions(+), 41 deletions(-) diff --git a/arch/powerpc/kernel/kvm.c b/arch/powerpc/kernel/kvm.c index 6568823cf306..7a7fb08df4c4 100644 --- a/arch/powerpc/kernel/kvm.c +++ b/arch/powerpc/kernel/kvm.c @@ -670,7 +670,7 @@ static void __init kvm_use_magic_page(void) /* Quick self-test to see if the mapping works */ if (fault_in_readable((const char __user *)KVM_MAGIC_PAGE, - sizeof(u32))) { + sizeof(u32), 0)) { kvm_patching_worked = false; return; } diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index 3e053e2fd6b6..7c817881d418 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -1048,7 +1048,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, if (new_ctx == NULL) return 0; if (!access_ok(new_ctx, ctx_size) || - fault_in_readable((char __user *)new_ctx, ctx_size)) + fault_in_readable((char __user *)new_ctx, ctx_size, 0)) return -EFAULT; /* @@ -1239,7 +1239,7 @@ SYSCALL_DEFINE3(debug_setcontext, struct ucontext __user *, ctx, #endif if (!access_ok(ctx, sizeof(*ctx)) || - fault_in_readable((char __user *)ctx, sizeof(*ctx))) + fault_in_readable((char __user *)ctx, sizeof(*ctx), 0)) return -EFAULT; /* diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index d1e1fc0acbea..732fa4e10d24 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -688,7 +688,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, if (new_ctx == NULL) return 0; if (!access_ok(new_ctx, ctx_size) || - fault_in_readable((char __user *)new_ctx, ctx_size)) + fault_in_readable((char __user *)new_ctx, ctx_size, 0)) return -EFAULT; /* diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index d5958278eba6..c9bd217e3364 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -309,7 +309,7 @@ static bool restore_fpregs_from_user(void __user *buf, u64 xrestore, if (ret != X86_TRAP_PF) return false; - if (!fault_in_readable(buf, size)) + if (!fault_in_readable(buf, size, 0)) goto retry; return false; } diff --git a/drivers/gpu/drm/armada/armada_gem.c b/drivers/gpu/drm/armada/armada_gem.c index 147abf1a3968..0f44219c0120 100644 --- a/drivers/gpu/drm/armada/armada_gem.c +++ b/drivers/gpu/drm/armada/armada_gem.c @@ -351,7 +351,7 @@ int armada_gem_pwrite_ioctl(struct drm_device *dev, void *data, if (!access_ok(ptr, args->size)) return -EFAULT; - if (fault_in_readable(ptr, args->size)) + if (fault_in_readable(ptr, args->size, 0)) return -EFAULT; dobj = armada_gem_object_lookup(file, args->handle); diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 11204dbbe053..96ac4b186b72 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1718,7 +1718,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, * Fault pages before locking them in prepare_pages * to avoid recursive lock */ - if (unlikely(fault_in_iov_iter_readable(i, write_bytes))) { + if (unlikely(fault_in_iov_iter_readable(i, write_bytes, 0))) { ret = -EFAULT; break; } @@ -2021,7 +2021,7 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from) if (left == prev_left) { err = -ENOTBLK; } else { - fault_in_iov_iter_readable(from, left); + fault_in_iov_iter_readable(from, left, 0); prev_left = left; goto again; } @@ -3772,7 +3772,7 @@ static ssize_t btrfs_direct_read(struct kiocb *iocb, struct iov_iter *to) * the first time we are retrying. Fault in as many pages * as possible and retry. */ - fault_in_iov_iter_writeable(to, left); + fault_in_iov_iter_writeable(to, left, 0); prev_left = left; goto again; } diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 92138ac2a4e2..c7d74c8776a1 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2223,7 +2223,7 @@ static noinline int search_ioctl(struct inode *inode, while (1) { ret = -EFAULT; - if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset)) + if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset, 0)) break; ret = btrfs_search_forward(root, &key, path, sk->min_transid); diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 92ec2699bc85..fb6eceac0d57 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -4276,7 +4276,7 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) size_t target_size = 0; int err; - if (fault_in_iov_iter_readable(from, iov_iter_count(from))) + if (fault_in_iov_iter_readable(from, iov_iter_count(from), 0)) set_inode_flag(inode, FI_NO_PREALLOC); if ((iocb->ki_flags & IOCB_NOWAIT)) { diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 9d6c5f6361f7..c823b9f70215 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -1162,7 +1162,7 @@ static ssize_t fuse_fill_write_pages(struct fuse_io_args *ia, again: err = -EFAULT; - if (fault_in_iov_iter_readable(ii, bytes)) + if (fault_in_iov_iter_readable(ii, bytes, 0)) break; err = -ENOMEM; diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 3e718cfc19a7..f7bd3bfd0690 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -847,7 +847,7 @@ static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to, size_t leftover; gfs2_holder_allow_demote(gh); - leftover = fault_in_iov_iter_writeable(to, window_size); + leftover = fault_in_iov_iter_writeable(to, window_size, 0); gfs2_holder_disallow_demote(gh); if (leftover != window_size) { if (!gfs2_holder_queued(gh)) @@ -916,7 +916,7 @@ static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from, size_t leftover; gfs2_holder_allow_demote(gh); - leftover = fault_in_iov_iter_readable(from, window_size); + leftover = fault_in_iov_iter_readable(from, window_size, 0); gfs2_holder_disallow_demote(gh); if (leftover != window_size) { if (!gfs2_holder_queued(gh)) @@ -985,7 +985,7 @@ static ssize_t gfs2_file_read_iter(struct kiocb *iocb, struct iov_iter *to) size_t leftover; gfs2_holder_allow_demote(&gh); - leftover = fault_in_iov_iter_writeable(to, window_size); + leftover = fault_in_iov_iter_writeable(to, window_size, 0); gfs2_holder_disallow_demote(&gh); if (leftover != window_size) { if (!gfs2_holder_queued(&gh)) { @@ -1063,7 +1063,7 @@ static ssize_t gfs2_file_buffered_write(struct kiocb *iocb, size_t leftover; gfs2_holder_allow_demote(gh); - leftover = fault_in_iov_iter_readable(from, window_size); + leftover = fault_in_iov_iter_readable(from, window_size, 0); gfs2_holder_disallow_demote(gh); if (leftover != window_size) { from->count = min(from->count, window_size - leftover); diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 1753c26c8e76..e7a529405775 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -750,7 +750,7 @@ static loff_t iomap_write_iter(struct iomap_iter *iter, struct iov_iter *i) * same page as we're writing to, without it being marked * up-to-date. */ - if (unlikely(fault_in_iov_iter_readable(i, bytes))) { + if (unlikely(fault_in_iov_iter_readable(i, bytes, 0))) { status = -EFAULT; break; } diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c index 2ae25e48a41a..441aeefda8b6 100644 --- a/fs/ntfs/file.c +++ b/fs/ntfs/file.c @@ -1830,7 +1830,7 @@ static ssize_t ntfs_perform_write(struct file *file, struct iov_iter *i, * pages being swapped out between us bringing them into memory * and doing the actual copying. */ - if (unlikely(fault_in_iov_iter_readable(i, bytes))) { + if (unlikely(fault_in_iov_iter_readable(i, bytes, 0))) { status = -EFAULT; break; } diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c index 787b53b984ee..208686bda052 100644 --- a/fs/ntfs3/file.c +++ b/fs/ntfs3/file.c @@ -990,7 +990,7 @@ static ssize_t ntfs_compress_write(struct kiocb *iocb, struct iov_iter *from) frame_vbo = pos & ~(frame_size - 1); index = frame_vbo >> PAGE_SHIFT; - if (unlikely(fault_in_iov_iter_readable(from, bytes))) { + if (unlikely(fault_in_iov_iter_readable(from, bytes, 0))) { err = -EFAULT; goto out; } diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 1a0c646eb6ff..79d328031247 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -909,9 +909,11 @@ void folio_add_wait_queue(struct folio *folio, wait_queue_entry_t *waiter); /* * Fault in userspace address range. */ -size_t fault_in_writeable(char __user *uaddr, size_t size); -size_t fault_in_safe_writeable(const char __user *uaddr, size_t size); -size_t fault_in_readable(const char __user *uaddr, size_t size); +size_t fault_in_writeable(char __user *uaddr, size_t size, size_t min_size); +size_t fault_in_safe_writeable(const char __user *uaddr, size_t size, + size_t min_size); +size_t fault_in_readable(const char __user *uaddr, size_t size, + size_t min_size); int add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t index, gfp_t gfp); diff --git a/include/linux/uio.h b/include/linux/uio.h index 6350354f97e9..06c54c3ab3f8 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -134,8 +134,10 @@ size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t bytes, struct iov_iter *i); void iov_iter_advance(struct iov_iter *i, size_t bytes); void iov_iter_revert(struct iov_iter *i, size_t bytes); -size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t bytes); -size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t bytes); +size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t bytes, + size_t min_size); +size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t bytes, + size_t min_size); size_t iov_iter_single_seg_count(const struct iov_iter *i); size_t copy_page_to_iter(struct page *page, size_t offset, size_t bytes, struct iov_iter *i); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 66a740e6e153..ecb95bb5c423 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -191,7 +191,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b buf = iov->iov_base + skip; copy = min(bytes, iov->iov_len - skip); - if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_writeable(buf, copy)) { + if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_writeable(buf, copy, 0)) { kaddr = kmap_atomic(page); from = kaddr + offset; @@ -275,7 +275,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t buf = iov->iov_base + skip; copy = min(bytes, iov->iov_len - skip); - if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_readable(buf, copy)) { + if (IS_ENABLED(CONFIG_HIGHMEM) && !fault_in_readable(buf, copy, 0)) { kaddr = kmap_atomic(page); to = kaddr + offset; @@ -433,6 +433,7 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by * fault_in_iov_iter_readable - fault in iov iterator for reading * @i: iterator * @size: maximum length + * @min_size: minimum size to be faulted in * * Fault in one or more iovecs of the given iov_iter, to a maximum length of * @size. For each iovec, fault in each page that constitutes the iovec. @@ -442,25 +443,32 @@ static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t by * * Always returns 0 for non-userspace iterators. */ -size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t size) +size_t fault_in_iov_iter_readable(const struct iov_iter *i, size_t size, + size_t min_size) { if (iter_is_iovec(i)) { size_t count = min(size, iov_iter_count(i)); const struct iovec *p; size_t skip; + size_t orig_size = size; size -= count; for (p = i->iov, skip = i->iov_offset; count; p++, skip = 0) { size_t len = min(count, p->iov_len - skip); + size_t min_len = min(len, min_size); size_t ret; if (unlikely(!len)) continue; - ret = fault_in_readable(p->iov_base + skip, len); + ret = fault_in_readable(p->iov_base + skip, len, + min_len); count -= len - ret; + min_size -= min(min_size, len - ret); if (ret) break; } + if (min_size) + return orig_size; return count + size; } return 0; @@ -471,6 +479,7 @@ EXPORT_SYMBOL(fault_in_iov_iter_readable); * fault_in_iov_iter_writeable - fault in iov iterator for writing * @i: iterator * @size: maximum length + * @min_size: minimum size to be faulted in * * Faults in the iterator using get_user_pages(), i.e., without triggering * hardware page faults. This is primarily useful when we already know that @@ -481,25 +490,32 @@ EXPORT_SYMBOL(fault_in_iov_iter_readable); * * Always returns 0 for non-user-space iterators. */ -size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t size) +size_t fault_in_iov_iter_writeable(const struct iov_iter *i, size_t size, + size_t min_size) { if (iter_is_iovec(i)) { size_t count = min(size, iov_iter_count(i)); const struct iovec *p; size_t skip; + size_t orig_size = size; size -= count; for (p = i->iov, skip = i->iov_offset; count; p++, skip = 0) { size_t len = min(count, p->iov_len - skip); + size_t min_len = min(len, min_size); size_t ret; if (unlikely(!len)) continue; - ret = fault_in_safe_writeable(p->iov_base + skip, len); + ret = fault_in_safe_writeable(p->iov_base + skip, len, + min_len); count -= len - ret; + min_size -= min(min_size, len - ret); if (ret) break; } + if (min_size) + return orig_size; return count + size; } return 0; diff --git a/mm/filemap.c b/mm/filemap.c index daa0e23a6ee6..e5d7f5b1e5cc 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -3743,7 +3743,7 @@ ssize_t generic_perform_write(struct file *file, * same page as we're writing to, without it being marked * up-to-date. */ - if (unlikely(fault_in_iov_iter_readable(i, bytes))) { + if (unlikely(fault_in_iov_iter_readable(i, bytes, 0))) { status = -EFAULT; break; } diff --git a/mm/gup.c b/mm/gup.c index 2c51e9748a6a..baa8240615a4 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1662,13 +1662,15 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start, * fault_in_writeable - fault in userspace address range for writing * @uaddr: start of address range * @size: size of address range + * @min_size: minimum size to be faulted in * * Returns the number of bytes not faulted in (like copy_to_user() and * copy_from_user()). */ -size_t fault_in_writeable(char __user *uaddr, size_t size) +size_t fault_in_writeable(char __user *uaddr, size_t size, size_t min_size) { char __user *start = uaddr, *end; + size_t faulted_in = size; if (unlikely(size == 0)) return 0; @@ -1688,8 +1690,10 @@ size_t fault_in_writeable(char __user *uaddr, size_t size) out: if (size > uaddr - start) - return size - (uaddr - start); - return 0; + faulted_in = uaddr - start; + if (faulted_in < min_size) + return size; + return size - faulted_in; } EXPORT_SYMBOL(fault_in_writeable); @@ -1697,6 +1701,7 @@ EXPORT_SYMBOL(fault_in_writeable); * fault_in_safe_writeable - fault in an address range for writing * @uaddr: start of address range * @size: length of address range + * @min_size: minimum size to be faulted in * * Faults in an address range using get_user_pages, i.e., without triggering * hardware page faults. This is primarily useful when we already know that @@ -1711,13 +1716,15 @@ EXPORT_SYMBOL(fault_in_writeable); * Returns the number of bytes not faulted in, like copy_to_user() and * copy_from_user(). */ -size_t fault_in_safe_writeable(const char __user *uaddr, size_t size) +size_t fault_in_safe_writeable(const char __user *uaddr, size_t size, + size_t min_size) { unsigned long start = (unsigned long)untagged_addr(uaddr); unsigned long end, nstart, nend; struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; int locked = 0; + size_t faulted_in = size; nstart = start & PAGE_MASK; end = PAGE_ALIGN(start + size); @@ -1750,9 +1757,11 @@ size_t fault_in_safe_writeable(const char __user *uaddr, size_t size) } if (locked) mmap_read_unlock(mm); - if (nstart == end) - return 0; - return size - min_t(size_t, nstart - start, size); + if (nstart != end) + faulted_in = min_t(size_t, nstart - start, size); + if (faulted_in < min_size) + return size; + return size - faulted_in; } EXPORT_SYMBOL(fault_in_safe_writeable); @@ -1760,14 +1769,17 @@ EXPORT_SYMBOL(fault_in_safe_writeable); * fault_in_readable - fault in userspace address range for reading * @uaddr: start of user address range * @size: size of user address range + * @min_size: minimum size to be faulted in * * Returns the number of bytes not faulted in (like copy_to_user() and * copy_from_user()). */ -size_t fault_in_readable(const char __user *uaddr, size_t size) +size_t fault_in_readable(const char __user *uaddr, size_t size, + size_t min_size) { const char __user *start = uaddr, *end; volatile char c; + size_t faulted_in = size; if (unlikely(size == 0)) return 0; @@ -1788,8 +1800,10 @@ size_t fault_in_readable(const char __user *uaddr, size_t size) out: (void)c; if (size > uaddr - start) - return size - (uaddr - start); - return 0; + faulted_in = uaddr - start; + if (faulted_in < min_size) + return size; + return size - faulted_in; } EXPORT_SYMBOL(fault_in_readable); From patchwork Wed Dec 1 19:37:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Catalin Marinas X-Patchwork-Id: 12694388 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5A480C433F5 for ; Wed, 1 Dec 2021 19:39:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=uRYTFeQA67P98fJSX/gVue6oksWBLKcLLoW/rG5hIL0=; b=a9ACCpRqqcakhV CFitOsWxQ3sYOjaVxKFJjaHT1ipjRScxF9Aa1/vGZHce/foVBWRNXy6tqB2kQbrLpvHb7653mNv1l 4iY7XtrcBiEB0k7vqgdsBWfiZMGqmj7UH1QLo5VDU8P7oJbsgRAz5/JpSlimpDvC240ZRqmYh14T2 pRgvXqaH0iyJ6x1h5umqDy+fGlPyzHA5RKhVsgb6dTXXgd8P261oqhB3SpgTAu6Gm0FF6Q4XHpv0c fEE5nC5E3RAR9+nS4HEip003X0phVMFoAefJkZo9U8bvOgIgVFBKJ7RmZEdp2IlxeHHnsAzyD7iU1 bcXHXwCrQn/q0KBGAsFA==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1msVQx-009xpk-1J; Wed, 01 Dec 2021 19:38:27 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1msVQY-009xj1-E6 for linux-arm-kernel@lists.infradead.org; Wed, 01 Dec 2021 19:38:04 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3275DB820F1; Wed, 1 Dec 2021 19:38:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B35D4C53FAD; Wed, 1 Dec 2021 19:37:57 +0000 (UTC) From: Catalin Marinas To: Linus Torvalds , Andreas Gruenbacher Cc: Josef Bacik , David Sterba , Al Viro , Andrew Morton , Will Deacon , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-btrfs@vger.kernel.org Subject: [PATCH v2 2/4] mm: Probe for sub-page faults in fault_in_*() Date: Wed, 1 Dec 2021 19:37:48 +0000 Message-Id: <20211201193750.2097885-3-catalin.marinas@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211201193750.2097885-1-catalin.marinas@arm.com> References: <20211201193750.2097885-1-catalin.marinas@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211201_113802_776177_AC2CA452 X-CRM114-Status: GOOD ( 21.81 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On hardware with features like arm64 MTE or SPARC ADI, an access fault can be triggered at sub-page granularity. Depending on how the fault_in_*() functions are used, the caller can get into a live-lock by continuously retrying the fault-in on an address different from the one where the uaccess failed. In the majority of cases progress is ensured by the following conditions: 1. copy_{to,from}_user_nofault() guarantees at least one byte access if the user address is not faulting. 2. The fault_in_*() loop is resumed from the next address that could not be accessed by copy_{to,from}_user_nofault(). If the loop iteration is restarted from an earlier point, the loop is repeated with the same conditions and it would live-lock. The same problem exists if the fault_in_*() is attempted on the fault address reported by copy_*_user_nofault() since the latter does not guarantee the maximum possible bytes are written and fault_in_*() will succeed in probing a single byte. Introduce probe_subpage_*() and call them from the corresponding fault_in_*() functions on the requested 'min_size' range. The arch code with sub-page faults will have to implement the specific probing functionality. Signed-off-by: Catalin Marinas --- arch/Kconfig | 7 ++++++ include/linux/uaccess.h | 53 +++++++++++++++++++++++++++++++++++++++++ mm/gup.c | 9 ++++--- 3 files changed, 66 insertions(+), 3 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 26b8ed11639d..02502b3362aa 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -27,6 +27,13 @@ config HAVE_IMA_KEXEC config SET_FS bool +config ARCH_HAS_SUBPAGE_FAULTS + bool + help + Select if the architecture can check permissions at sub-page + granularity (e.g. arm64 MTE). The probe_user_*() functions + must be implemented. + config HOTPLUG_SMT bool diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index ac0394087f7d..04ad214c98cd 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -271,6 +271,59 @@ static inline bool pagefault_disabled(void) */ #define faulthandler_disabled() (pagefault_disabled() || in_atomic()) +#ifndef CONFIG_ARCH_HAS_SUBPAGE_FAULTS + +/** + * probe_subpage_writeable: probe the user range for write faults at sub-page + * granularity (e.g. arm64 MTE) + * @uaddr: start of address range + * @size: size of address range + * + * Returns 0 on success, the number of bytes not probed on fault. + * + * It is expected that the caller checked for the write permission of each + * page in the range either by put_user() or GUP. The architecture port can + * implement a more efficient get_user() probing if the same sub-page faults + * are triggered by either a read or a write. + */ +static inline size_t probe_subpage_writeable(void __user *uaddr, size_t size) +{ + return 0; +} + +/** + * probe_subpage_safe_writeable: probe the user range for write faults at + * sub-page granularity without corrupting the + * existing data + * @uaddr: start of address range + * @size: size of address range + * + * Returns 0 on success, the number of bytes not probed on fault. + * + * It is expected that the caller checked for the write permission of each + * page in the range either by put_user() or GUP. + */ +static inline size_t probe_subpage_safe_writeable(void __user *uaddr, + size_t size) +{ + return 0; +} + +/** + * probe_subpage_readable: probe the user range for read faults at sub-page + * granularity + * @uaddr: start of address range + * @size: size of address range + * + * Returns 0 on success, the number of bytes not probed on fault. + */ +static inline size_t probe_subpage_readable(void __user *uaddr, size_t size) +{ + return 0; +} + +#endif + #ifndef ARCH_HAS_NOCACHE_UACCESS static inline __must_check unsigned long diff --git a/mm/gup.c b/mm/gup.c index baa8240615a4..7fa69b0fb859 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1691,7 +1691,8 @@ size_t fault_in_writeable(char __user *uaddr, size_t size, size_t min_size) out: if (size > uaddr - start) faulted_in = uaddr - start; - if (faulted_in < min_size) + if (faulted_in < min_size || + (min_size && probe_subpage_writeable(start, min_size))) return size; return size - faulted_in; } @@ -1759,7 +1760,8 @@ size_t fault_in_safe_writeable(const char __user *uaddr, size_t size, mmap_read_unlock(mm); if (nstart != end) faulted_in = min_t(size_t, nstart - start, size); - if (faulted_in < min_size) + if (faulted_in < min_size || + (min_size && probe_subpage_safe_writeable(uaddr, min_size))) return size; return size - faulted_in; } @@ -1801,7 +1803,8 @@ size_t fault_in_readable(const char __user *uaddr, size_t size, (void)c; if (size > uaddr - start) faulted_in = uaddr - start; - if (faulted_in < min_size) + if (faulted_in < min_size || + (min_size && probe_subpage_readable(start, min_size))) return size; return size - faulted_in; } From patchwork Wed Dec 1 19:37:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Catalin Marinas X-Patchwork-Id: 12694389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 56ACAC4332F for ; Wed, 1 Dec 2021 19:39:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=7WdFc3m0JJWKLaS+BJyzg+pVGILS9hMNsm9A3XPr2Mo=; b=NFCSPXPqzMqnU8 DWJ/BhJFFlvZsOZd46TmJc0DoFEmKf1V6i4+Bgyu+au/fozFVW8iN0uoJf0DL5jlxqqh0CcEHvT41 Q3S1NtVinKoPhAAwF32pTz0lDhrsGrYBxCT37sBvvIFm/JBhxB9wQ5HtDI9Mkg+PsAlSfSn9ZoBIm Lbp45D9dobgS6TlqJgu1/JfIsB0MKnekdphl434nlOgOwYqDyhGPwXpQsff2MFkvL5W488Ea/EDwA ohuxzU+2BfNU/zO+v8ighUVTgRTFDmprUtSJ7iRs9ZuObOD0w8nJWpP1oVELYRgdhSYYZRadEcwOZ ygZY3UMXBDokZ/cPkIHw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1msVR7-009xtJ-TY; Wed, 01 Dec 2021 19:38:38 +0000 Received: from ams.source.kernel.org ([145.40.68.75]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1msVQb-009xkM-4I for linux-arm-kernel@lists.infradead.org; Wed, 01 Dec 2021 19:38:07 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id BD391B8211D; Wed, 1 Dec 2021 19:38:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4F10CC53FCC; Wed, 1 Dec 2021 19:38:00 +0000 (UTC) From: Catalin Marinas To: Linus Torvalds , Andreas Gruenbacher Cc: Josef Bacik , David Sterba , Al Viro , Andrew Morton , Will Deacon , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-btrfs@vger.kernel.org Subject: [PATCH v2 3/4] arm64: Add support for user sub-page fault probing Date: Wed, 1 Dec 2021 19:37:49 +0000 Message-Id: <20211201193750.2097885-4-catalin.marinas@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211201193750.2097885-1-catalin.marinas@arm.com> References: <20211201193750.2097885-1-catalin.marinas@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211201_113805_454116_83BD0635 X-CRM114-Status: GOOD ( 19.01 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org With MTE, even if the pte allows an access, a mismatched tag somewhere within a page can still cause a fault. Select ARCH_HAS_SUBPAGE_FAULTS if MTE is enabled and implement the probe_subpage_*() functions. Note that get_user() is sufficient for the writeable checks since the same tag mismatch fault would be triggered by a read. Signed-off-by: Catalin Marinas --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/uaccess.h | 59 ++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+) diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index c4207cf9bb17..dff89fd0d817 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1777,6 +1777,7 @@ config ARM64_MTE depends on AS_HAS_LSE_ATOMICS # Required for tag checking in the uaccess routines depends on ARM64_PAN + select ARCH_HAS_SUBPAGE_FAULTS select ARCH_USES_HIGH_VMA_FLAGS help Memory Tagging (part of the ARMv8.5 Extensions) provides diff --git a/arch/arm64/include/asm/uaccess.h b/arch/arm64/include/asm/uaccess.h index 6e2e0b7031ab..bcbd24b97917 100644 --- a/arch/arm64/include/asm/uaccess.h +++ b/arch/arm64/include/asm/uaccess.h @@ -445,4 +445,63 @@ static inline int __copy_from_user_flushcache(void *dst, const void __user *src, } #endif +#ifdef CONFIG_ARCH_HAS_SUBPAGE_FAULTS + +/* + * Return 0 on success, the number of bytes not accessed otherwise. + */ +static inline size_t __mte_probe_user_range(const char __user *uaddr, + size_t size, bool skip_first) +{ + const char __user *end = uaddr + size; + int err = 0; + char val; + + uaddr = PTR_ALIGN_DOWN(uaddr, MTE_GRANULE_SIZE); + if (skip_first) + uaddr += MTE_GRANULE_SIZE; + while (uaddr < end) { + /* + * A read is sufficient for MTE, the caller should have probed + * for the pte write permission if required. + */ + __raw_get_user(val, uaddr, err); + if (err) + return end - uaddr; + uaddr += MTE_GRANULE_SIZE; + } + (void)val; + + return 0; +} + +static inline size_t probe_subpage_writeable(const void __user *uaddr, + size_t size) +{ + if (!system_supports_mte()) + return 0; + /* first put_user() done in the caller */ + return __mte_probe_user_range(uaddr, size, true); +} + +static inline size_t probe_subpage_safe_writeable(const void __user *uaddr, + size_t size) +{ + if (!system_supports_mte()) + return 0; + /* the caller used GUP, don't skip the first granule */ + return __mte_probe_user_range(uaddr, size, false); +} + +static inline size_t probe_subpage_readable(const void __user *uaddr, + size_t size) +{ + if (!system_supports_mte()) + return 0; + /* first get_user() done in the caller */ + return __mte_probe_user_range(uaddr, size, true); +} + +#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */ + #endif /* __ASM_UACCESS_H */ From patchwork Wed Dec 1 19:37:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Catalin Marinas X-Patchwork-Id: 12694390 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 5F8A7C433F5 for ; Wed, 1 Dec 2021 19:40:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=5QlTa1bsLg+aSDZtvTS5UcwY4yfe5eXwzyNMV8RPT9E=; b=G2fFgu5qdOvSX5 1jc01LPfqmFqDw7tpNgKeIKFjT3DkysJRhFk9YC7/qjqpQ/1vT2MJ9cwazebgA9DAw41QZ6kNPrqS f8wj0TRWuuEsiXN8zZTmpPFQYi4fLy+UDhIXa+f+So6UkCBOUsPZ8P1Vfsdrim+OyMg0xAOewN80M fEbTjVkSS/CI3X5fJFHOFLvPG2EC7gRZ5h7d5MvKazJCyAwEVPl4r0dXkNt9hiacdK3feqpel6lA5 8VchT0uAsDIudeqVolRVc98qv/g7jR6xZFPHEIl7UGlEzBOqLZFe8Nslk7eDbzxvSl/B+dIH7TmPB XbmZdwCdqwYxmp7bZhCg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1msVRI-009xww-Rl; Wed, 01 Dec 2021 19:38:49 +0000 Received: from sin.source.kernel.org ([2604:1380:40e1:4800::1]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1msVQe-009xm6-JC for linux-arm-kernel@lists.infradead.org; Wed, 01 Dec 2021 19:38:10 +0000 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id DC40DCE20D6; Wed, 1 Dec 2021 19:38:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DE066C53FD2; Wed, 1 Dec 2021 19:38:02 +0000 (UTC) From: Catalin Marinas To: Linus Torvalds , Andreas Gruenbacher Cc: Josef Bacik , David Sterba , Al Viro , Andrew Morton , Will Deacon , Matthew Wilcox , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-btrfs@vger.kernel.org Subject: [PATCH v2 4/4] btrfs: Avoid live-lock in search_ioctl() on hardware with sub-page faults Date: Wed, 1 Dec 2021 19:37:50 +0000 Message-Id: <20211201193750.2097885-5-catalin.marinas@arm.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20211201193750.2097885-1-catalin.marinas@arm.com> References: <20211201193750.2097885-1-catalin.marinas@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211201_113808_856638_1DDAB326 X-CRM114-Status: GOOD ( 13.93 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Commit a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl") addressed a lockdep warning by pre-faulting the user pages and attempting the copy_to_user_nofault() in an infinite loop. On architectures like arm64 with MTE, an access may fault within a page at a location different from what fault_in_writeable() probed. Since the sk_offset is rewound to the previous struct btrfs_ioctl_search_header boundary, there is no guaranteed forward progress and search_ioctl() may live-lock. Request a 'min_size' of (*buf_size - sk_offset) from fault_in_writeable() to check this range for sub-page faults. Fixes: a48b73eca4ce ("btrfs: fix potential deadlock in the search ioctl") Signed-off-by: Catalin Marinas Reported-by: Al Viro --- fs/btrfs/ioctl.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index c7d74c8776a1..439cf38f320a 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2222,8 +2222,13 @@ static noinline int search_ioctl(struct inode *inode, key.offset = sk->min_offset; while (1) { + size_t len = *buf_size - sk_offset; ret = -EFAULT; - if (fault_in_writeable(ubuf + sk_offset, *buf_size - sk_offset, 0)) + /* + * Ensure that the whole user buffer is faulted in at sub-page + * granularity, otherwise the loop may live-lock. + */ + if (fault_in_writeable(ubuf + sk_offset, len, len)) break; ret = btrfs_search_forward(root, &key, path, sk->min_transid);