From patchwork Fri Aug 11 14:32:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 13350968 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ABAB4C001B0 for ; Fri, 11 Aug 2023 14:40:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EBE036B0071; Fri, 11 Aug 2023 10:40:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E6DD66B0072; Fri, 11 Aug 2023 10:40:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D0EA78D0001; Fri, 11 Aug 2023 10:40:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B99966B0071 for ; Fri, 11 Aug 2023 10:40:10 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 603981A05FA for ; Fri, 11 Aug 2023 14:40:10 +0000 (UTC) X-FDA: 81112083780.17.16F0993 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf26.hostedemail.com (Postfix) with ESMTP id 4619C140015 for ; Fri, 11 Aug 2023 14:40:08 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=doC1b6ls; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691764808; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=u98Q9/3PNuqfp+7F3WU3X/c/FDaKth2LMUF/uwnHzpM=; b=g/m+AvedSir8YpCiuAcJ/fpVHQ9daTkFiPsGGEBMML6Ku8PNeRiS1jqnIfhAlaDxup6jQ1 bqtktx59XPPNdjbWCQ2RAQfd+l5xxi+qIzu2B5rQXEJazL0M94eLVLihtAVA4D448llKiW 79iS93L+txjzyDGrWFAIgk9HDb3g79Y= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=doC1b6ls; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf26.hostedemail.com: domain of dhowells@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=dhowells@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691764808; a=rsa-sha256; cv=none; b=RZK8Od/rEDUaxVdVDSJVk5Bse/lNRXs5KtmmGOGBY31IyzMkY6FQsNUosiigc7e0891UCq pkFzdew+z4dvCS8Obf/lEd2dmy64JOuECrUkzs9B6VToU/X//Z5MR3g5WNODkcdueUDbXO 73WA/EbAYvEvjdwygWCE0Mlbfkuv9js= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691764807; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=u98Q9/3PNuqfp+7F3WU3X/c/FDaKth2LMUF/uwnHzpM=; b=doC1b6lsvDFJR32z5FYYBNWge7SelSra/a9/QqLGw1EFHEdJM0eMZjT5sTAgzqMaehIsH9 nw7WSSjln9QmWqV442DGO/XYOKSIpMqlRXxO+0EfZb8LpXXmFoVig9WbcBk7Hr5FiOV6+F OmXT9nQMl1Is2y3xrXlwIuGKCfVOJEw= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-106-v2GgarISMtWvDjCxmikj-g-1; Fri, 11 Aug 2023 10:34:43 -0400 X-MC-Unique: v2GgarISMtWvDjCxmikj-g-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 432B785C715; Fri, 11 Aug 2023 14:32:12 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.42.28.185]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6D3891121314; Fri, 11 Aug 2023 14:32:10 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells To: Alexander Viro , Jens Axboe , Christoph Hellwig , Christian Brauner , Matthew Wilcox , Linus Torvalds cc: dhowells@redhat.com, jlayton@kernel.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH] iov_iter: Convert iterate*() to inline funcs MIME-Version: 1.0 Content-ID: <3710260.1691764329.1@warthog.procyon.org.uk> Date: Fri, 11 Aug 2023 15:32:09 +0100 Message-ID: <3710261.1691764329@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Rspamd-Queue-Id: 4619C140015 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: azqkmqat8en5ts5gpqap3i9exe9ff9tf X-HE-Tag: 1691764808-47351 X-HE-Meta: U2FsdGVkX1/K0w59e53qG6X+ABjfvOMJe3mUN9O+a9BgDqvYQF4OxVKtlOKCNJNyTv0Erplxxs6hkJ6cVtV7kLFyESxEA8mFX/rt+un4IoyPe5k0+em+wAPA6Kd3Bdtems9cCGJTyrIPsJR/nkJ0JspOKkL1736yuTztDf7CiOzAC7SVNxBnqFCTxBpwcEPRQzBGNQEfiUuocxD/f0BXre7vqm3R+4UJgFyxyF4lV3P9+SunisNX+PwJM4038TnZIhXoPdkPb5PY34J7tDL0JoChSP4x6PSkazaTl2QIAshbrtkkYQG10k4yvc25l9nq6art2YQQIDPcuaeDB87lNJrdKb2iDxEcTPCvRWiIwL9q4fa86qglitsfitWHz6A3es57Lgi+vvpCAILiO0DQeiuPwKPqbQxCbnu8wEjEEVn+soWsZkzykcUFjpr0pPhfIAuhq4avFYg4FqxspOnsj3aYDxaudAqkvG/BNjU6LnTh3C8cdB4kRNAFrbl50DRdk3CeduYqvup4VeOH4aNkoGXQkE5V0ik9qv19RR/1VrNHwSuF6jY4nFHLVCTTqkc3QJwBoCrJLTgurePXCUtKWAXvVWWiE7u5uycjQ7egAvt6kbYm4pyoleinoEJXweqiZYwY3gB5KGkNuJdmGh8uU3uKba54zk7uOdBU3eH8iqfwUcjVRPOgcY4P5WkiOUXDAG2d/Jn8u8AEnFueTc+nAxc/TNURJ8etrqAQhDW3O+ACIo5r+Mm5OHIq//CuQA1uTBCBYVbgW2t+vGi/xOBbxdZrzDXV+VF/wSUpPbDw1spt+Pfag3BoDMU0urzVLeSFTs8vNRdXRgR0hltG71NSJAzd8ab1bujZw9nwg8LCtBxWZWLXXshdb5i0xEb+jsd2cjqdZPdHTnO81UdUzU96kIrhNvsQ6ZPeF8PdPFuiBBaOIwRNhrScAelJMYNpCaBj1FhlqO8B8p0rT4h+Mec P791MHmx zmERiYL+x4hhusxRQrXPgTguO8IKNs5DvSBZaGq1AcARWo9eCJk1YICaoiGV0Yf8T4a9ukvnyLIDnLidZbkNZ3JxTzYnuGqKDa+OvGzuyhk7Po9O9OBcASfxgGQ0+TX9g6Q3mZPDcjHqYd5sCC9PTrAda1rv9U/QVGmNukzbLmJDAtRapb9lphHEY6tFYfVgEBPaKmtB6xuuVyA/akMC+RXlA1O+tAKlVi+lZ4kJkw32kUNX2vHCeuKPbYWr0XSxu+xDh8qsBHRnDrs7k+0F4wsqpmpL9aAotgU8yaeX88tPfLWpjUYdDMUFbEs4J0YF3m9GkQm6TL022FAdp9Iv5+z7dNCFvdRg8RQlsnFb0/LhmbZy5qGwTue5EIkXJSYLjpLnCWJT9lm5MvKtEWCva3yliAEB9PyvYpDnKEliZ/Mt5VFv2Bx4/XZK2O8R4ODzahx9k02f5WBN0ko0F+6NEIk8Ghhp6qfiD4Z5bYG5yJn+svqNCk0tTRZKFpEvqMwLLZ40WACpEi5eAv0A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Convert the iov_iter iteration macros to inline functions to make the code easier to follow. Ideally, the optimiser would produce much the same code in both cases, but the revised code ends up a bit bigger. This may be because I'm passing in a pointer to somewhere to place the checksum for those functions that need it - it could instead be passed in the private information. The changes in compiled function size on x86_64 look like: _copy_from_iter chg 0x36e -> 0x401 _copy_from_iter_flushcache chg 0x359 -> 0x374 _copy_from_iter_nocache chg 0x36a -> 0x37f _copy_mc_to_iter chg 0x3a7 -> 0x3be _copy_to_iter chg 0x358 -> 0x362 copy_from_user_iter.constprop.0 new 0x32 copy_page_from_iter_atomic chg 0x3d2 -> 0x465 copy_page_to_iter_nofault.part.0 chg 0x3f1 -> 0x3fe copy_to_user_iter.constprop.0 new 0x32 copy_to_user_iter_mc.constprop.0 new 0x2c copyin del 0x30 copyout del 0x2d copyout_mc del 0x2b csum_and_copy_from_iter chg 0x3e8 -> 0x44d csum_and_copy_to_iter chg 0x46a -> 0x48d iov_iter_zero chg 0x34f -> 0x36b memcpy_from_iter new 0x19 memcpy_from_iter.isra.0 del 0x1f memcpy_from_iter_csum new 0x22 memcpy_from_iter_mc new 0xf memcpy_to_iter_csum new 0x1f zero_to_user_iter.part.0 new 0x18 with the .text section increasing by nearly 700 bytes overall. Removing the __always_inline tag from iterate_xarray() reduces the text size by over 2KiB. That might be worth doing as that's quite an involved algorithm requiring the RCU lock be taken and doing a bunch of xarray things. Removing all the __always_inline tags shrinks the text by over 7.5KIB. I'm not sure of the performance impact of doing that, though. Jens: I believe you had a good way of testing the performance? "Here's what I get. nullb0 using blk-mq, and submit_queues==NPROC. iostats and merging disabled, using 8k bs for t/io_uring to ensure we have > 1 segment. Everything pinned to the same CPU to ensure reproducibility and stability. Kernel has CONFIG_RETPOLINE enabled." Can you give me a quick crib as to how I set that up? Signed-off-by: David Howells cc: Alexander Viro cc: Jens Axboe cc: Christoph Hellwig , cc: Christian Brauner , cc: Matthew Wilcox , cc: Linus Torvalds cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org --- lib/iov_iter.c | 606 ++++++++++++++++++++++++++++++++++----------------------- 1 file changed, 366 insertions(+), 240 deletions(-) diff --git a/lib/iov_iter.c b/lib/iov_iter.c index b667b1e2f688..8943ac25e202 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -14,188 +14,283 @@ #include #include -/* covers ubuf and kbuf alike */ -#define iterate_buf(i, n, base, len, off, __p, STEP) { \ - size_t __maybe_unused off = 0; \ - len = n; \ - base = __p + i->iov_offset; \ - len -= (STEP); \ - i->iov_offset += len; \ - n = len; \ -} - -/* covers iovec and kvec alike */ -#define iterate_iovec(i, n, base, len, off, __p, STEP) { \ - size_t off = 0; \ - size_t skip = i->iov_offset; \ - do { \ - len = min(n, __p->iov_len - skip); \ - if (likely(len)) { \ - base = __p->iov_base + skip; \ - len -= (STEP); \ - off += len; \ - skip += len; \ - n -= len; \ - if (skip < __p->iov_len) \ - break; \ - } \ - __p++; \ - skip = 0; \ - } while (n); \ - i->iov_offset = skip; \ - n = off; \ -} - -#define iterate_bvec(i, n, base, len, off, p, STEP) { \ - size_t off = 0; \ - unsigned skip = i->iov_offset; \ - while (n) { \ - unsigned offset = p->bv_offset + skip; \ - unsigned left; \ - void *kaddr = kmap_local_page(p->bv_page + \ - offset / PAGE_SIZE); \ - base = kaddr + offset % PAGE_SIZE; \ - len = min(min(n, (size_t)(p->bv_len - skip)), \ - (size_t)(PAGE_SIZE - offset % PAGE_SIZE)); \ - left = (STEP); \ - kunmap_local(kaddr); \ - len -= left; \ - off += len; \ - skip += len; \ - if (skip == p->bv_len) { \ - skip = 0; \ - p++; \ - } \ - n -= len; \ - if (left) \ - break; \ - } \ - i->iov_offset = skip; \ - n = off; \ -} - -#define iterate_xarray(i, n, base, len, __off, STEP) { \ - __label__ __out; \ - size_t __off = 0; \ - struct folio *folio; \ - loff_t start = i->xarray_start + i->iov_offset; \ - pgoff_t index = start / PAGE_SIZE; \ - XA_STATE(xas, i->xarray, index); \ - \ - len = PAGE_SIZE - offset_in_page(start); \ - rcu_read_lock(); \ - xas_for_each(&xas, folio, ULONG_MAX) { \ - unsigned left; \ - size_t offset; \ - if (xas_retry(&xas, folio)) \ - continue; \ - if (WARN_ON(xa_is_value(folio))) \ - break; \ - if (WARN_ON(folio_test_hugetlb(folio))) \ - break; \ - offset = offset_in_folio(folio, start + __off); \ - while (offset < folio_size(folio)) { \ - base = kmap_local_folio(folio, offset); \ - len = min(n, len); \ - left = (STEP); \ - kunmap_local(base); \ - len -= left; \ - __off += len; \ - n -= len; \ - if (left || n == 0) \ - goto __out; \ - offset += len; \ - len = PAGE_SIZE; \ - } \ - } \ -__out: \ - rcu_read_unlock(); \ - i->iov_offset += __off; \ - n = __off; \ -} - -#define __iterate_and_advance(i, n, base, len, off, I, K) { \ - if (unlikely(i->count < n)) \ - n = i->count; \ - if (likely(n)) { \ - if (likely(iter_is_ubuf(i))) { \ - void __user *base; \ - size_t len; \ - iterate_buf(i, n, base, len, off, \ - i->ubuf, (I)) \ - } else if (likely(iter_is_iovec(i))) { \ - const struct iovec *iov = iter_iov(i); \ - void __user *base; \ - size_t len; \ - iterate_iovec(i, n, base, len, off, \ - iov, (I)) \ - i->nr_segs -= iov - iter_iov(i); \ - i->__iov = iov; \ - } else if (iov_iter_is_bvec(i)) { \ - const struct bio_vec *bvec = i->bvec; \ - void *base; \ - size_t len; \ - iterate_bvec(i, n, base, len, off, \ - bvec, (K)) \ - i->nr_segs -= bvec - i->bvec; \ - i->bvec = bvec; \ - } else if (iov_iter_is_kvec(i)) { \ - const struct kvec *kvec = i->kvec; \ - void *base; \ - size_t len; \ - iterate_iovec(i, n, base, len, off, \ - kvec, (K)) \ - i->nr_segs -= kvec - i->kvec; \ - i->kvec = kvec; \ - } else if (iov_iter_is_xarray(i)) { \ - void *base; \ - size_t len; \ - iterate_xarray(i, n, base, len, off, \ - (K)) \ - } \ - i->count -= n; \ - } \ -} -#define iterate_and_advance(i, n, base, len, off, I, K) \ - __iterate_and_advance(i, n, base, len, off, I, ((void)(K),0)) - -static int copyout(void __user *to, const void *from, size_t n) +typedef size_t (*iov_step_f)(void *iter_base, size_t progress, size_t len, + void *priv, __wsum *csum); +typedef size_t (*iov_ustep_f)(void __user *iter_base, size_t progress, size_t len, + void *priv, __wsum *csum); + +static __always_inline +size_t iterate_ubuf(struct iov_iter *iter, size_t len, void *priv, __wsum *csum, + iov_ustep_f step) +{ + void __user *base = iter->ubuf; + size_t remain; + + remain = step(base + iter->iov_offset, 0, len, priv, csum); + len -= remain; + iter->iov_offset += len; + iter->count -= len; + return len; +} + +static __always_inline +size_t iterate_iovec(struct iov_iter *iter, size_t len, void *priv, __wsum *csum, + iov_ustep_f step) { - if (should_fail_usercopy()) - return n; - if (access_ok(to, n)) { - instrument_copy_to_user(to, from, n); - n = raw_copy_to_user(to, from, n); + const struct iovec *p = iter->__iov; + size_t progress = 0, skip = iter->iov_offset; + + do { + size_t remain, consumed; + size_t part = min(len, p->iov_len - skip); + + if (likely(part)) { + remain = step(p->iov_base + skip, progress, part, priv, csum); + consumed = part - remain; + progress += consumed; + skip += consumed; + len -= consumed; + if (skip < p->iov_len) + break; + } + p++; + skip = 0; + } while (len); + + iter->iov_offset = skip; + iter->nr_segs -= p - iter->__iov; + iter->__iov = p; + iter->count -= progress; + return progress; +} + +static __always_inline +size_t iterate_kvec(struct iov_iter *iter, size_t len, void *priv, __wsum *csum, + iov_step_f step) +{ + const struct kvec *p = iter->kvec; + size_t progress = 0, skip = iter->iov_offset; + + do { + size_t remain, consumed; + size_t part = min(len, p->iov_len - skip); + + if (likely(part)) { + remain = step(p->iov_base + skip, progress, part, priv, csum); + consumed = part - remain; + progress += consumed; + skip += consumed; + len -= consumed; + if (skip < p->iov_len) + break; + } + p++; + skip = 0; + } while (len); + + iter->iov_offset = skip; + iter->nr_segs -= p - iter->kvec; + iter->kvec = p; + iter->count -= progress; + return progress; +} + +static __always_inline +size_t iterate_bvec(struct iov_iter *iter, size_t len, void *priv, __wsum *csum, + iov_step_f step) +{ + const struct bio_vec *p = iter->bvec; + size_t progress = 0, skip = iter->iov_offset; + + do { + size_t remain, consumed; + size_t offset = p->bv_offset + skip, part; + void *kaddr = kmap_local_page(p->bv_page + offset / PAGE_SIZE); + + part = min3(len, + (size_t)(p->bv_len - skip), + (size_t)(PAGE_SIZE - offset % PAGE_SIZE)); + remain = step(kaddr + offset % PAGE_SIZE, progress, part, priv, csum); + kunmap_local(kaddr); + consumed = part - remain; + len -= consumed; + progress += consumed; + skip += consumed; + if (skip >= p->bv_len) { + skip = 0; + p++; + } + if (remain) + break; + } while (len); + + iter->iov_offset = skip; + iter->nr_segs -= p - iter->bvec; + iter->bvec = p; + iter->count -= progress; + return progress; +} + +static __always_inline +size_t iterate_xarray(struct iov_iter *iter, size_t len, void *priv, __wsum *csum, + iov_step_f step) +{ + struct folio *folio; + size_t progress = 0; + loff_t start = iter->xarray_start + iter->iov_offset; + pgoff_t index = start / PAGE_SIZE; + XA_STATE(xas, iter->xarray, index); + + rcu_read_lock(); + xas_for_each(&xas, folio, ULONG_MAX) { + size_t remain, consumed, offset, part, flen; + + if (xas_retry(&xas, folio)) + continue; + if (WARN_ON(xa_is_value(folio))) + break; + if (WARN_ON(folio_test_hugetlb(folio))) + break; + + offset = offset_in_folio(folio, start); + flen = min(folio_size(folio) - offset, len); + start += flen; + + while (flen) { + void *base = kmap_local_folio(folio, offset); + + part = min(flen, PAGE_SIZE - offset_in_page(offset)); + remain = step(base, progress, part, priv, csum); + kunmap_local(base); + + consumed = part - remain; + progress += consumed; + len -= consumed; + + if (remain || len == 0) + goto out; + flen -= consumed; + offset += consumed; + } + } + +out: + rcu_read_unlock(); + iter->iov_offset += progress; + iter->count -= progress; + return progress; +} + +static __always_inline +size_t iterate_and_advance(struct iov_iter *iter, size_t len, void *priv, + iov_ustep_f ustep, iov_step_f step) +{ + if (unlikely(iter->count < len)) + len = iter->count; + if (unlikely(!len)) + return 0; + + switch (iov_iter_type(iter)) { + case ITER_UBUF: + return iterate_ubuf(iter, len, priv, NULL, ustep); + case ITER_IOVEC: + return iterate_iovec(iter, len, priv, NULL, ustep); + case ITER_KVEC: + return iterate_kvec(iter, len, priv, NULL, step); + case ITER_BVEC: + return iterate_bvec(iter, len, priv, NULL, step); + case ITER_XARRAY: + return iterate_xarray(iter, len, priv, NULL, step); + case ITER_DISCARD: + iter->count -= len; + return len; } - return n; + return 0; } -static int copyout_nofault(void __user *to, const void *from, size_t n) +static __always_inline +size_t iterate_and_advance_csum(struct iov_iter *iter, size_t len, void *priv, + __wsum *csum, iov_ustep_f ustep, iov_step_f step) { - long res; + if (unlikely(iter->count < len)) + len = iter->count; + if (unlikely(!len)) + return 0; + switch (iov_iter_type(iter)) { + case ITER_UBUF: + return iterate_ubuf(iter, len, priv, csum, ustep); + case ITER_IOVEC: + return iterate_iovec(iter, len, priv, csum, ustep); + case ITER_KVEC: + return iterate_kvec(iter, len, priv, csum, step); + case ITER_BVEC: + return iterate_bvec(iter, len, priv, csum, step); + case ITER_XARRAY: + return iterate_xarray(iter, len, priv, csum, step); + case ITER_DISCARD: + iter->count -= len; + return len; + } + return 0; +} + +static size_t copy_to_user_iter(void __user *iter_to, size_t progress, + size_t len, void *from, __wsum *csum) +{ if (should_fail_usercopy()) - return n; + return len; + if (access_ok(iter_to, len)) { + from += progress; + instrument_copy_to_user(iter_to, from, len); + len = raw_copy_to_user(iter_to, from, len); + } + return len; +} + +static size_t copy_to_user_iter_nofault(void __user *iter_to, size_t progress, + size_t len, void *from, __wsum *csum) +{ + ssize_t res; - res = copy_to_user_nofault(to, from, n); + if (should_fail_usercopy()) + return len; - return res < 0 ? n : res; + from += progress; + res = copy_to_user_nofault(iter_to, from, len); + return res < 0 ? len : res; } -static int copyin(void *to, const void __user *from, size_t n) +static size_t copy_from_user_iter(void __user *iter_from, size_t progress, + size_t len, void *to, __wsum *csum) { - size_t res = n; + size_t res = len; if (should_fail_usercopy()) - return n; - if (access_ok(from, n)) { - instrument_copy_from_user_before(to, from, n); - res = raw_copy_from_user(to, from, n); - instrument_copy_from_user_after(to, from, n, res); + return len; + if (access_ok(iter_from, len)) { + to += progress; + instrument_copy_from_user_before(to, iter_from, len); + res = raw_copy_from_user(to, iter_from, len); + instrument_copy_from_user_after(to, iter_from, len, res); } return res; } +static size_t memcpy_to_iter(void *iter_to, size_t progress, + size_t len, void *from, __wsum *csum) +{ + memcpy(iter_to, from + progress, len); + return 0; +} + +static size_t memcpy_from_iter(void *iter_from, size_t progress, + size_t len, void *to, __wsum *csum) +{ + memcpy(to + progress, iter_from, len); + return 0; +} + /* * fault_in_iov_iter_readable - fault in iov iterator for reading * @i: iterator @@ -313,23 +408,27 @@ size_t _copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i) return 0; if (user_backed_iter(i)) might_fault(); - iterate_and_advance(i, bytes, base, len, off, - copyout(base, addr + off, len), - memcpy(base, addr + off, len) - ) - - return bytes; + return iterate_and_advance(i, bytes, (void *)addr, + copy_to_user_iter, memcpy_to_iter); } EXPORT_SYMBOL(_copy_to_iter); #ifdef CONFIG_ARCH_HAS_COPY_MC -static int copyout_mc(void __user *to, const void *from, size_t n) +static size_t copy_to_user_iter_mc(void __user *iter_to, size_t progress, + size_t len, void *from, __wsum *csum) { - if (access_ok(to, n)) { - instrument_copy_to_user(to, from, n); - n = copy_mc_to_user((__force void *) to, from, n); + if (access_ok(iter_to, len)) { + from += progress; + instrument_copy_to_user(iter_to, from, len); + len = copy_mc_to_user(iter_to, from, len); } - return n; + return len; +} + +static size_t memcpy_to_iter_mc(void *iter_to, size_t progress, + size_t len, void *from, __wsum *csum) +{ + return copy_mc_to_kernel(iter_to, from + progress, len); } /** @@ -362,22 +461,16 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i) return 0; if (user_backed_iter(i)) might_fault(); - __iterate_and_advance(i, bytes, base, len, off, - copyout_mc(base, addr + off, len), - copy_mc_to_kernel(base, addr + off, len) - ) - - return bytes; + return iterate_and_advance(i, bytes, (void *)addr, + copy_to_user_iter_mc, memcpy_to_iter_mc); } EXPORT_SYMBOL_GPL(_copy_mc_to_iter); #endif /* CONFIG_ARCH_HAS_COPY_MC */ -static void *memcpy_from_iter(struct iov_iter *i, void *to, const void *from, - size_t size) +static size_t memcpy_from_iter_mc(void *iter_from, size_t progress, + size_t len, void *to, __wsum *csum) { - if (iov_iter_is_copy_mc(i)) - return (void *)copy_mc_to_kernel(to, from, size); - return memcpy(to, from, size); + return copy_mc_to_kernel(to + progress, iter_from, len); } size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) @@ -387,30 +480,37 @@ size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) if (user_backed_iter(i)) might_fault(); - iterate_and_advance(i, bytes, base, len, off, - copyin(addr + off, base, len), - memcpy_from_iter(i, addr + off, base, len) - ) - - return bytes; + return iterate_and_advance(i, bytes, addr, + copy_from_user_iter, + iov_iter_is_copy_mc(i) ? + memcpy_from_iter_mc : memcpy_from_iter); } EXPORT_SYMBOL(_copy_from_iter); +static size_t copy_from_user_iter_nocache(void __user *iter_from, size_t progress, + size_t len, void *to, __wsum *csum) +{ + return __copy_from_user_inatomic_nocache(to + progress, iter_from, len); +} + size_t _copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) { if (WARN_ON_ONCE(!i->data_source)) return 0; - iterate_and_advance(i, bytes, base, len, off, - __copy_from_user_inatomic_nocache(addr + off, base, len), - memcpy(addr + off, base, len) - ) - - return bytes; + return iterate_and_advance(i, bytes, addr, + copy_from_user_iter_nocache, + memcpy_from_iter); } EXPORT_SYMBOL(_copy_from_iter_nocache); #ifdef CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE +static size_t copy_from_user_iter_flushcache(void __user *iter_from, size_t progress, + size_t len, void *to, __wsum *csum) +{ + return __copy_from_user_flushcache(to + progress, iter_from, len); +} + /** * _copy_from_iter_flushcache - write destination through cpu cache * @addr: destination kernel address @@ -432,12 +532,9 @@ size_t _copy_from_iter_flushcache(void *addr, size_t bytes, struct iov_iter *i) if (WARN_ON_ONCE(!i->data_source)) return 0; - iterate_and_advance(i, bytes, base, len, off, - __copy_from_user_flushcache(addr + off, base, len), - memcpy_flushcache(addr + off, base, len) - ) - - return bytes; + return iterate_and_advance(i, bytes, addr, + copy_from_user_iter_flushcache, + memcpy_from_iter); } EXPORT_SYMBOL_GPL(_copy_from_iter_flushcache); #endif @@ -509,10 +606,9 @@ size_t copy_page_to_iter_nofault(struct page *page, unsigned offset, size_t byte void *kaddr = kmap_local_page(page); size_t n = min(bytes, (size_t)PAGE_SIZE - offset); - iterate_and_advance(i, n, base, len, off, - copyout_nofault(base, kaddr + offset + off, len), - memcpy(base, kaddr + offset + off, len) - ) + n = iterate_and_advance(i, bytes, kaddr, + copy_to_user_iter_nofault, + memcpy_to_iter); kunmap_local(kaddr); res += n; bytes -= n; @@ -555,14 +651,23 @@ size_t copy_page_from_iter(struct page *page, size_t offset, size_t bytes, } EXPORT_SYMBOL(copy_page_from_iter); -size_t iov_iter_zero(size_t bytes, struct iov_iter *i) +static size_t zero_to_user_iter(void __user *iter_to, size_t progress, + size_t len, void *from, __wsum *csum) { - iterate_and_advance(i, bytes, base, len, count, - clear_user(base, len), - memset(base, 0, len) - ) + return clear_user(iter_to, len); +} - return bytes; +static size_t zero_to_iter(void *iter_to, size_t progress, + size_t len, void *from, __wsum *csum) +{ + memset(iter_to, 0, len); + return 0; +} + +size_t iov_iter_zero(size_t bytes, struct iov_iter *i) +{ + return iterate_and_advance(i, bytes, NULL, + zero_to_user_iter, zero_to_iter); } EXPORT_SYMBOL(iov_iter_zero); @@ -578,10 +683,11 @@ size_t copy_page_from_iter_atomic(struct page *page, unsigned offset, size_t byt kunmap_atomic(kaddr); return 0; } - iterate_and_advance(i, bytes, base, len, off, - copyin(p + off, base, len), - memcpy_from_iter(i, p + off, base, len) - ) + + bytes = iterate_and_advance(i, bytes, p, + copy_from_user_iter, + iov_iter_is_copy_mc(i) ? + memcpy_from_iter_mc : memcpy_from_iter); kunmap_atomic(kaddr); return bytes; } @@ -1168,32 +1274,56 @@ ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, } EXPORT_SYMBOL(iov_iter_get_pages_alloc2); +static size_t copy_from_user_iter_csum(void __user *iter_from, size_t progress, + size_t len, void *to, __wsum *csum) +{ + __wsum next; + + next = csum_and_copy_from_user(iter_from, to + progress, len); + *csum = csum_block_add(*csum, next, progress); + return next ? 0 : len; +} + +static size_t memcpy_from_iter_csum(void *iter_from, size_t progress, + size_t len, void *to, __wsum *csum) +{ + *csum = csum_and_memcpy(to + progress, iter_from, len, *csum, progress); + return 0; +} + size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i) { - __wsum sum, next; - sum = *csum; if (WARN_ON_ONCE(!i->data_source)) return 0; - - iterate_and_advance(i, bytes, base, len, off, ({ - next = csum_and_copy_from_user(base, addr + off, len); - sum = csum_block_add(sum, next, off); - next ? 0 : len; - }), ({ - sum = csum_and_memcpy(addr + off, base, len, sum, off); - }) - ) - *csum = sum; - return bytes; + return iterate_and_advance_csum(i, bytes, addr, csum, + copy_from_user_iter_csum, + memcpy_from_iter_csum); } EXPORT_SYMBOL(csum_and_copy_from_iter); +static size_t copy_to_user_iter_csum(void __user *iter_to, size_t progress, + size_t len, void *from, __wsum *csum) +{ + __wsum next; + + next = csum_and_copy_to_user(from + progress, iter_to, len); + *csum = csum_block_add(*csum, next, progress); + return next ? 0 : len; +} + +static size_t memcpy_to_iter_csum(void *iter_to, size_t progress, + size_t len, void *from, __wsum *csum) +{ + *csum = csum_and_memcpy(iter_to, from + progress, len, *csum, progress); + return 0; +} + size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate, struct iov_iter *i) { struct csum_state *csstate = _csstate; - __wsum sum, next; + __wsum sum; if (WARN_ON_ONCE(i->data_source)) return 0; @@ -1207,14 +1337,10 @@ size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *_csstate, } sum = csum_shift(csstate->csum, csstate->off); - iterate_and_advance(i, bytes, base, len, off, ({ - next = csum_and_copy_to_user(addr + off, base, len); - sum = csum_block_add(sum, next, off); - next ? 0 : len; - }), ({ - sum = csum_and_memcpy(base, addr + off, len, sum, off); - }) - ) + + bytes = iterate_and_advance_csum(i, bytes, (void *)addr, &sum, + copy_to_user_iter_csum, + memcpy_to_iter_csum); csstate->csum = csum_shift(sum, csstate->off); csstate->off += bytes; return bytes;