From patchwork Fri May 17 21:41:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10948677 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 55E9C14DB for ; Fri, 17 May 2019 21:41:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 46DA12847F for ; Fri, 17 May 2019 21:41:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3AD9728485; Fri, 17 May 2019 21:41:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5638C28481 for ; Fri, 17 May 2019 21:41:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727644AbfEQVlj (ORCPT ); Fri, 17 May 2019 17:41:39 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:41952 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727474AbfEQVlj (ORCPT ); Fri, 17 May 2019 17:41:39 -0400 Received: by mail-pl1-f196.google.com with SMTP id f12so3907138plt.8 for ; Fri, 17 May 2019 14:41:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=6zAlRh9OPYhPFQn4Cl/zMapUx0b+2kyu/Sbx6quVpQU=; b=cVLb9VW5kjd+lYa1M9FVRl9blBqTIjIZ5HEj0shwuAojucnsh9a9xepahCxl9Fzaip FXo8KeUeyA1ltkFdprAhN+UVmAol4psb3bO4sbBAZ4gWP38kYLH0KaLkzY56zSWiXjQ1 BOBX/TF/DgMBe2mLRor2KEWVHiFcUrbjXBEm8Id0HNhg3ZR01vO1uPYMqrEkdjCjNtTO 1LS9lnNXOvKsUa0XKSfYvpfXN+ByWcrdcn3BgpW2hISy6AkiA0w5gAt54iyicoFN9kep NOWuOWs+JVZN8VvG7+ZBiJrerg0BAR3P2fseV7PcoY1D5YFtWyqAMuFqgBAP/TiKYjYf IM8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=6zAlRh9OPYhPFQn4Cl/zMapUx0b+2kyu/Sbx6quVpQU=; b=qmvDdPVxF973bcvfHS1UMaYQR+aXbzlslE7MSHnScwKbeDcqzSIW9f3v/imp168AFa kBvZcTU3EO0VTamTph5zvv/aRUJi5O0nYaK9DyK7W8wbCaTbcv7FEz3Bog51kC8Z3nFU 5lB4zZkx6wyf+6KDv+/hshceN2MAqjHVRuYfhAIHPyUXeczvz02Bmzf/MXIix99xn0PL Rmzs+T+0FKUQ4/2EGRDzvux7OxuL0MN08Vw7A7ikDsXCho+2kctj0SNWlw1SYz5pCecj EeFOLUXx4biOekAcjNTUh2C8PUrx/bQbo9imNCFP/YZXZCCDHrgcCjSLU/5iLItu7Xwx MgBw== X-Gm-Message-State: APjAAAUKTHh5z4V264pEkLZb7ZJk2JV6rvsdIizgWAFEIlI0vCl1VsBC q0bmLMfb+MPkkIOFEIcjF1nNr7U/QDKthg== X-Google-Smtp-Source: APXvYqwgtCB6cxUJEH96U+O9qrI31proNY65sh+24QyvRhPY18xvtMlfKKXLaK3v3zosCaNrf21lew== X-Received: by 2002:a17:902:2bc5:: with SMTP id l63mr60983501plb.202.1558129298111; Fri, 17 May 2019 14:41:38 -0700 (PDT) Received: from x1.localdomain (66.29.164.166.static.utbb.net. [66.29.164.166]) by smtp.gmail.com with ESMTPSA id z125sm15885331pfb.75.2019.05.17.14.41.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 May 2019 14:41:37 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 1/3] uio: make import_iovec()/compat_import_iovec() return bytes on success Date: Fri, 17 May 2019 15:41:29 -0600 Message-Id: <20190517214131.5925-2-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190517214131.5925-1-axboe@kernel.dk> References: <20190517214131.5925-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Currently these functions return < 0 on error, and 0 for success. Change that so that we return < 0 on error, but number of bytes for success. Some callers already treat the return value that way, others need a slight tweak. Signed-off-by: Jens Axboe --- fs/aio.c | 9 +++++---- fs/io_uring.c | 16 ++++++++-------- fs/splice.c | 8 ++++---- include/linux/uio.h | 4 ++-- lib/iov_iter.c | 15 ++++++++------- net/compat.c | 3 ++- net/socket.c | 3 ++- 7 files changed, 31 insertions(+), 27 deletions(-) diff --git a/fs/aio.c b/fs/aio.c index 3490d1fa0e16..41824c710b36 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1479,8 +1479,9 @@ static int aio_prep_rw(struct kiocb *req, const struct iocb *iocb) return 0; } -static int aio_setup_rw(int rw, const struct iocb *iocb, struct iovec **iovec, - bool vectored, bool compat, struct iov_iter *iter) +static ssize_t aio_setup_rw(int rw, const struct iocb *iocb, + struct iovec **iovec, bool vectored, bool compat, + struct iov_iter *iter) { void __user *buf = (void __user *)(uintptr_t)iocb->aio_buf; size_t len = iocb->aio_nbytes; @@ -1537,7 +1538,7 @@ static int aio_read(struct kiocb *req, const struct iocb *iocb, return -EINVAL; ret = aio_setup_rw(READ, iocb, &iovec, vectored, compat, &iter); - if (ret) + if (ret < 0) return ret; ret = rw_verify_area(READ, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) @@ -1565,7 +1566,7 @@ static int aio_write(struct kiocb *req, const struct iocb *iocb, return -EINVAL; ret = aio_setup_rw(WRITE, iocb, &iovec, vectored, compat, &iter); - if (ret) + if (ret < 0) return ret; ret = rw_verify_area(WRITE, file, &req->ki_pos, iov_iter_count(&iter)); if (!ret) { diff --git a/fs/io_uring.c b/fs/io_uring.c index 65dd05505a16..84cb31e69727 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1003,9 +1003,9 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw, return 0; } -static int io_import_iovec(struct io_ring_ctx *ctx, int rw, - const struct sqe_submit *s, struct iovec **iovec, - struct iov_iter *iter) +static ssize_t io_import_iovec(struct io_ring_ctx *ctx, int rw, + const struct sqe_submit *s, struct iovec **iovec, + struct iov_iter *iter) { const struct io_uring_sqe *sqe = s->sqe; void __user *buf = u64_to_user_ptr(READ_ONCE(sqe->addr)); @@ -1023,7 +1023,7 @@ static int io_import_iovec(struct io_ring_ctx *ctx, int rw, opcode = READ_ONCE(sqe->opcode); if (opcode == IORING_OP_READ_FIXED || opcode == IORING_OP_WRITE_FIXED) { - int ret = io_import_fixed(ctx, rw, sqe, iter); + ssize_t ret = io_import_fixed(ctx, rw, sqe, iter); *iovec = NULL; return ret; } @@ -1089,7 +1089,7 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s, struct iov_iter iter; struct file *file; size_t iov_count; - int ret; + ssize_t ret; ret = io_prep_rw(req, s, force_nonblock); if (ret) @@ -1102,7 +1102,7 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s, return -EINVAL; ret = io_import_iovec(req->ctx, READ, s, &iovec, &iter); - if (ret) + if (ret < 0) return ret; iov_count = iov_iter_count(&iter); @@ -1136,7 +1136,7 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s, struct iov_iter iter; struct file *file; size_t iov_count; - int ret; + ssize_t ret; ret = io_prep_rw(req, s, force_nonblock); if (ret) @@ -1149,7 +1149,7 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s, return -EINVAL; ret = io_import_iovec(req->ctx, WRITE, s, &iovec, &iter); - if (ret) + if (ret < 0) return ret; iov_count = iov_iter_count(&iter); diff --git a/fs/splice.c b/fs/splice.c index 25212dcca2df..c5b951b0b639 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -1355,7 +1355,7 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov, struct iovec iovstack[UIO_FASTIOV]; struct iovec *iov = iovstack; struct iov_iter iter; - long error; + ssize_t error; struct fd f; int type; @@ -1366,7 +1366,7 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov, error = import_iovec(type, uiov, nr_segs, ARRAY_SIZE(iovstack), &iov, &iter); - if (!error) { + if (error >= 0) { error = do_vmsplice(f.file, &iter, flags); kfree(iov); } @@ -1381,7 +1381,7 @@ COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, io struct iovec iovstack[UIO_FASTIOV]; struct iovec *iov = iovstack; struct iov_iter iter; - long error; + ssize_t error; struct fd f; int type; @@ -1392,7 +1392,7 @@ COMPAT_SYSCALL_DEFINE4(vmsplice, int, fd, const struct compat_iovec __user *, io error = compat_import_iovec(type, iov32, nr_segs, ARRAY_SIZE(iovstack), &iov, &iter); - if (!error) { + if (error >= 0) { error = do_vmsplice(f.file, &iter, flags); kfree(iov); } diff --git a/include/linux/uio.h b/include/linux/uio.h index 2d0131ad4604..a61ceb6575ab 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -279,13 +279,13 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct size_t hash_and_copy_to_iter(const void *addr, size_t bytes, void *hashp, struct iov_iter *i); -int import_iovec(int type, const struct iovec __user * uvector, +ssize_t import_iovec(int type, const struct iovec __user * uvector, unsigned nr_segs, unsigned fast_segs, struct iovec **iov, struct iov_iter *i); #ifdef CONFIG_COMPAT struct compat_iovec; -int compat_import_iovec(int type, const struct compat_iovec __user * uvector, +ssize_t compat_import_iovec(int type, const struct compat_iovec __user * uvector, unsigned nr_segs, unsigned fast_segs, struct iovec **iov, struct iov_iter *i); #endif diff --git a/lib/iov_iter.c b/lib/iov_iter.c index f74fa832f3aa..ad23f930f703 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1633,9 +1633,9 @@ EXPORT_SYMBOL(dup_iter); * on-stack array was used or not (and regardless of whether this function * returns an error or not). * - * Return: 0 on success or negative error code on error. + * Return: Negative error code on error, bytes imported on success */ -int import_iovec(int type, const struct iovec __user * uvector, +ssize_t import_iovec(int type, const struct iovec __user * uvector, unsigned nr_segs, unsigned fast_segs, struct iovec **iov, struct iov_iter *i) { @@ -1651,16 +1651,17 @@ int import_iovec(int type, const struct iovec __user * uvector, } iov_iter_init(i, type, p, nr_segs, n); *iov = p == *iov ? NULL : p; - return 0; + return n; } EXPORT_SYMBOL(import_iovec); #ifdef CONFIG_COMPAT #include -int compat_import_iovec(int type, const struct compat_iovec __user * uvector, - unsigned nr_segs, unsigned fast_segs, - struct iovec **iov, struct iov_iter *i) +ssize_t compat_import_iovec(int type, + const struct compat_iovec __user * uvector, + unsigned nr_segs, unsigned fast_segs, + struct iovec **iov, struct iov_iter *i) { ssize_t n; struct iovec *p; @@ -1674,7 +1675,7 @@ int compat_import_iovec(int type, const struct compat_iovec __user * uvector, } iov_iter_init(i, type, p, nr_segs, n); *iov = p == *iov ? NULL : p; - return 0; + return n; } #endif diff --git a/net/compat.c b/net/compat.c index a031bd333092..eb498fbf35de 100644 --- a/net/compat.c +++ b/net/compat.c @@ -79,9 +79,10 @@ int get_compat_msghdr(struct msghdr *kmsg, kmsg->msg_iocb = NULL; - return compat_import_iovec(save_addr ? READ : WRITE, + err = compat_import_iovec(save_addr ? READ : WRITE, compat_ptr(msg.msg_iov), msg.msg_iovlen, UIO_FASTIOV, iov, &kmsg->msg_iter); + return err < 0 ? err : 0; } /* Bleech... */ diff --git a/net/socket.c b/net/socket.c index bdd6942f6152..418ae316e6f8 100644 --- a/net/socket.c +++ b/net/socket.c @@ -2208,9 +2208,10 @@ static int copy_msghdr_from_user(struct msghdr *kmsg, kmsg->msg_iocb = NULL; - return import_iovec(save_addr ? READ : WRITE, + err = import_iovec(save_addr ? READ : WRITE, msg.msg_iov, msg.msg_iovlen, UIO_FASTIOV, iov, &kmsg->msg_iter); + return err < 0 ? err : 0; } static int ___sys_sendmsg(struct socket *sock, struct user_msghdr __user *msg, From patchwork Fri May 17 21:41:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10948681 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2C4EE1515 for ; Fri, 17 May 2019 21:41:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1DCED2847F for ; Fri, 17 May 2019 21:41:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 123BF28481; Fri, 17 May 2019 21:41:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id BAF4C2848B for ; Fri, 17 May 2019 21:41:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727474AbfEQVll (ORCPT ); Fri, 17 May 2019 17:41:41 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:46018 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727698AbfEQVlk (ORCPT ); Fri, 17 May 2019 17:41:40 -0400 Received: by mail-pl1-f194.google.com with SMTP id a5so3891694pls.12 for ; Fri, 17 May 2019 14:41:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=/jHW6cMkDx3mVPfEO6LslioW8HEfGg9KsHH7NblluP8=; b=C54D+ubtzGZYydcfOL7SEs8biFf9ApAqZiN0+FPzUtHDeqo8wEqU3ZdQJ2ExAfCnPJ 6OrrAA/5GcP+HzuiLPOCXehXhvJdACpiF6bYJSspsDZ/iln3VnLPjkUdBjd08J0HR8TT dAF31ZUS2vkkL5/kkSiArcqAFMWXHnH+C2J2x+TMddQdl4BKpKeSGKLZIlLZlj5wYWRj X8VcHJeCWmZ08ytM6yaGmno1ETA2T6EWCC7UNYAeXDHCNomniSBfdwdjfELiIXxN92kR O+owbAjgkHNexzN5L/qjNP6zLMzHMsIye8fMwayevGwQkOL43y09bC7OQa76nhN3Jm91 QivQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=/jHW6cMkDx3mVPfEO6LslioW8HEfGg9KsHH7NblluP8=; b=I25Dld2iPM3AfWaWbxTTd0GTfUkibKimTwfWODjtDC6jKb57WswGrxzJ6WTLmsFXn/ 5+EtdDkSaUy4SZie+HVrmWPkyC8pZOeSBRMgmPEznQxCLjnd660LvVMlT+ndEDPckxOq hqx7xDg9VwaRAAD6x9MZRuoO8b9teTz9qcw4ZR9FsYUyJs9KaNaPjjKWC36dfrTXq9f0 u5CLIMeJPEQ43vVrlkY/5pHfoV38blCuqONSApuVFvSoAR6O1+y86HMZUFNPtKpte8mE UmOybfnvVhVnalvbdJ9eYRihUBoVrH0XQ+tQmGVJ7a4lHtNPQvML/Ol+2Vbgwwm4cEIp fd9Q== X-Gm-Message-State: APjAAAV6DFeq5IXQtFb+sRx8Q2TSDAcgbM7YAdaAux3OQwqk3W2gYoqq IFOCY+Gm642RYgwMYr6rEHcN2CBANmhMBw== X-Google-Smtp-Source: APXvYqw/pHAEwy6zewJqMf4CyQcBxIRh8CIf5P2ChqsBZmhu22V1LoceofI9tNeceNYWjfGURUUO8g== X-Received: by 2002:a17:902:42a5:: with SMTP id h34mr37252464pld.178.1558129299912; Fri, 17 May 2019 14:41:39 -0700 (PDT) Received: from x1.localdomain (66.29.164.166.static.utbb.net. [66.29.164.166]) by smtp.gmail.com with ESMTPSA id z125sm15885331pfb.75.2019.05.17.14.41.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 May 2019 14:41:39 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 2/3] io_uring: punt short reads to async context Date: Fri, 17 May 2019 15:41:30 -0600 Message-Id: <20190517214131.5925-3-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190517214131.5925-1-axboe@kernel.dk> References: <20190517214131.5925-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP We can encounter a short read when we're doing buffered reads and the data is partially cached. Right now we just return the short read, but that forces the application to read that CQE, then issue another SQE to finish the read. That read will not be cached, and hence will result in an async punt. It's more efficient to do that async punt from within the kernel, as that will the not need two round trips more to the kernel. Signed-off-by: Jens Axboe --- fs/io_uring.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 84cb31e69727..3e7f08094a85 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1089,7 +1089,7 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s, struct iov_iter iter; struct file *file; size_t iov_count; - ssize_t ret; + ssize_t read_size, ret; ret = io_prep_rw(req, s, force_nonblock); if (ret) @@ -1105,13 +1105,24 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s, if (ret < 0) return ret; + read_size = ret; iov_count = iov_iter_count(&iter); ret = rw_verify_area(READ, file, &kiocb->ki_pos, iov_count); if (!ret) { ssize_t ret2; - /* Catch -EAGAIN return for forced non-blocking submission */ ret2 = call_read_iter(file, kiocb, &iter); + /* + * In case of a short read, punt to async. This can happen + * if we have data partially cached. Alternatively we can + * return the short read, in which case the application will + * need to issue another SQE and wait for it. That SQE will + * need async punt anyway, so it's more efficient to do it + * here. + */ + if (force_nonblock && ret2 > 0 && ret2 < read_size) + ret2 = -EAGAIN; + /* Catch -EAGAIN return for forced non-blocking submission */ if (!force_nonblock || ret2 != -EAGAIN) { io_rw_done(kiocb, ret2); } else { From patchwork Fri May 17 21:41:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10948685 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1A24C13AD for ; Fri, 17 May 2019 21:41:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0A7C72847F for ; Fri, 17 May 2019 21:41:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F2E9828485; Fri, 17 May 2019 21:41:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17A682847F for ; Fri, 17 May 2019 21:41:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727806AbfEQVlp (ORCPT ); Fri, 17 May 2019 17:41:45 -0400 Received: from mail-pl1-f194.google.com ([209.85.214.194]:42636 "EHLO mail-pl1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727771AbfEQVlm (ORCPT ); Fri, 17 May 2019 17:41:42 -0400 Received: by mail-pl1-f194.google.com with SMTP id x15so3905999pln.9 for ; Fri, 17 May 2019 14:41:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=203ULvcqchdXRyFRjwUMpCGtfQTE/xI1Zq4IJe/er60=; b=zQZQh0kiidK3+1zSHzTXqZeZGy7G+LcPsIDMEPSDNeBIJapW1OYVOrv2zMZ+xWZY75 sBrPaoWGYhrpZ0iQ8loUkQblRAHT9Rm1zEA+KqCFo2fQGpLo4+nZUk7gm0Bzo1OjuNNe kDqRYTEU5TX+5yvYTa1TZvT3Rhi+OkemVlLSWG4RYINC2bgGktEEldNWW9/ORnrXg4Do 2eeeqfuNGxtHWcUHTBEmvXCk2K3D7RyJ30/bshIaKxnGnLb0wsTE8JvXh83cVFSKM7F+ cs8qzzybq61O73wpxCAnshmVqgwclL2k4w2HDRwNnd0SkIEQ5YYIwwmafauu+tWTRWGw 0ksw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=203ULvcqchdXRyFRjwUMpCGtfQTE/xI1Zq4IJe/er60=; b=itNG9DTcZnjZVyS4MBGS2dJJ5sct1geWzGS5Y7xQZ6tbhQHXLX8KW15SXKx+t5+ABJ 2Gy0ySj3y+46F/KFgByavW1Pc111X1ck3Ul4bCtsKlN5J6xQFHBSh3a4dfTunijoCY2n +4SJDbafcsFN0dbTAXkNPtvo35mioO16E23Zs9+e9w3fvirjjB+pmn+k5mJt6vK2JE5V 6gB+w/gIX+gAsAVlCPg2R0mgUmrLOiAO459Z/f3zpDQfbSahYRAiFAHeWZh+d9qxV93v JyWitBYlFnK5HFpyt7aCK2drEfxnBQFH3J4UT1LqK6xyU5/d0MLoIDMF50IVfA6Wtgbc A14Q== X-Gm-Message-State: APjAAAWd9cc9Gk/lb643I42GRIqRrCtiFJ4ZS0WGJYZzbDwSdoY5H9Av MBvmnFhqXfkXMa0vgGr3OUxosadIF3W/1A== X-Google-Smtp-Source: APXvYqxDKwBocKF0kHsI0v5Qd+lcW+QNKcn+4JoUvDlJBO0No4eQvS0wNQszTTzpBb8RW6z2P0BWhw== X-Received: by 2002:a17:902:2a07:: with SMTP id i7mr61366174plb.125.1558129301540; Fri, 17 May 2019 14:41:41 -0700 (PDT) Received: from x1.localdomain (66.29.164.166.static.utbb.net. [66.29.164.166]) by smtp.gmail.com with ESMTPSA id z125sm15885331pfb.75.2019.05.17.14.41.39 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 17 May 2019 14:41:40 -0700 (PDT) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org Cc: viro@zeniv.linux.org.uk, Jens Axboe Subject: [PATCH 3/3] io_uring: add support for sqe links Date: Fri, 17 May 2019 15:41:31 -0600 Message-Id: <20190517214131.5925-4-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190517214131.5925-1-axboe@kernel.dk> References: <20190517214131.5925-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP With SQE links, we can create chains of dependent SQEs. One example would be queueing an SQE that's a read from one file descriptor, with the linked SQE being a write to another with the same set of buffers. An SQE link will not stall the pipeline, it'll just ensure that dependent SQEs aren't issued before the previous link has completed. Any error at submission or completion time will break the chain of SQEs. For completions, this also includes short reads or writes, as the next SQE could depend on the previous one being fully completed. Any SQE in a chain that gets canceled due to any of the above errors, will get an CQE fill with -ECANCELED as the error value. Signed-off-by: Jens Axboe --- fs/io_uring.c | 211 +++++++++++++++++++++++++++------- include/uapi/linux/io_uring.h | 1 + 2 files changed, 172 insertions(+), 40 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 3e7f08094a85..b45313ed8337 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -322,6 +322,7 @@ struct io_kiocb { struct io_ring_ctx *ctx; struct list_head list; + struct list_head link_list; unsigned int flags; refcount_t refs; #define REQ_F_NOWAIT 1 /* must not punt to workers */ @@ -330,8 +331,10 @@ struct io_kiocb { #define REQ_F_SEQ_PREV 8 /* sequential with previous */ #define REQ_F_IO_DRAIN 16 /* drain existing IO first */ #define REQ_F_IO_DRAINED 32 /* drain done */ +#define REQ_F_LINK 64 /* linked sqes */ +#define REQ_F_FAIL_LINK 128 /* fail rest of links */ u64 user_data; - u32 error; /* iopoll result from callback */ + u32 result; u32 sequence; struct work_struct work; @@ -583,6 +586,7 @@ static struct io_kiocb *io_get_req(struct io_ring_ctx *ctx, req->flags = 0; /* one is dropped after submission, the other at completion */ refcount_set(&req->refs, 2); + req->result = 0; return req; out: io_ring_drop_ctx_refs(ctx, 1); @@ -598,7 +602,7 @@ static void io_free_req_many(struct io_ring_ctx *ctx, void **reqs, int *nr) } } -static void io_free_req(struct io_kiocb *req) +static void __io_free_req(struct io_kiocb *req) { if (req->file && !(req->flags & REQ_F_FIXED_FILE)) fput(req->file); @@ -606,6 +610,56 @@ static void io_free_req(struct io_kiocb *req) kmem_cache_free(req_cachep, req); } +static void io_req_link_next(struct io_kiocb *req) +{ + struct io_kiocb *nxt; + + nxt = list_first_entry_or_null(&req->link_list, struct io_kiocb, list); + list_del(&nxt->list); + if (!list_empty(&req->link_list)) { + INIT_LIST_HEAD(&nxt->link_list); + list_splice(&req->link_list, &nxt->link_list); + nxt->flags |= REQ_F_LINK; + } + + INIT_WORK(&nxt->work, io_sq_wq_submit_work); + queue_work(req->ctx->sqo_wq, &nxt->work); +} + +/* + * Called if REQ_F_LINK is set, and we fail the head request + */ +static void io_fail_links(struct io_kiocb *req) +{ + struct io_kiocb *link; + + while (!list_empty(&req->link_list)) { + link = list_first_entry(&req->link_list, struct io_kiocb, list); + list_del(&link->list); + + io_cqring_add_event(req->ctx, link->user_data, -ECANCELED); + __io_free_req(link); + } +} + +static void io_free_req(struct io_kiocb *req) +{ + /* + * If LINK is set, we have dependent requests in this chain. If we + * didn't fail this request, queue the first one up, moving any other + * dependencies to the next request. In case of failure, fail the rest + * of the chain. + */ + if (req->flags & REQ_F_LINK) { + if (req->flags & REQ_F_FAIL_LINK) + io_fail_links(req); + else + io_req_link_next(req); + } + + __io_free_req(req); +} + static void io_put_req(struct io_kiocb *req) { if (refcount_dec_and_test(&req->refs)) @@ -627,7 +681,7 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, req = list_first_entry(done, struct io_kiocb, list); list_del(&req->list); - io_cqring_fill_event(ctx, req->user_data, req->error); + io_cqring_fill_event(ctx, req->user_data, req->result); (*nr_events)++; if (refcount_dec_and_test(&req->refs)) { @@ -636,7 +690,8 @@ static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, * completions for those, only batch free for fixed * file. */ - if (req->flags & REQ_F_FIXED_FILE) { + if ((req->flags & (REQ_F_FIXED_FILE|REQ_F_LINK)) == + REQ_F_FIXED_FILE) { reqs[to_free++] = req; if (to_free == ARRAY_SIZE(reqs)) io_free_req_many(ctx, reqs, &to_free); @@ -775,6 +830,8 @@ static void io_complete_rw(struct kiocb *kiocb, long res, long res2) kiocb_end_write(kiocb); + if ((req->flags & REQ_F_LINK) && res != req->result) + req->flags |= REQ_F_FAIL_LINK; io_cqring_add_event(req->ctx, req->user_data, res); io_put_req(req); } @@ -785,7 +842,9 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res, long res2) kiocb_end_write(kiocb); - req->error = res; + if ((req->flags & REQ_F_LINK) && res != req->result) + req->flags |= REQ_F_FAIL_LINK; + req->result = res; if (res != -EAGAIN) req->flags |= REQ_F_IOPOLL_COMPLETED; } @@ -928,7 +987,6 @@ static int io_prep_rw(struct io_kiocb *req, const struct sqe_submit *s, !kiocb->ki_filp->f_op->iopoll) return -EOPNOTSUPP; - req->error = 0; kiocb->ki_flags |= IOCB_HIPRI; kiocb->ki_complete = io_complete_rw_iopoll; } else { @@ -1106,6 +1164,9 @@ static int io_read(struct io_kiocb *req, const struct sqe_submit *s, return ret; read_size = ret; + if (req->flags & REQ_F_LINK) + req->result = read_size; + iov_count = iov_iter_count(&iter); ret = rw_verify_area(READ, file, &kiocb->ki_pos, iov_count); if (!ret) { @@ -1163,6 +1224,9 @@ static int io_write(struct io_kiocb *req, const struct sqe_submit *s, if (ret < 0) return ret; + if (req->flags & REQ_F_LINK) + req->result = ret; + iov_count = iov_iter_count(&iter); ret = -EAGAIN; @@ -1266,6 +1330,8 @@ static int io_fsync(struct io_kiocb *req, const struct io_uring_sqe *sqe, end > 0 ? end : LLONG_MAX, fsync_flags & IORING_FSYNC_DATASYNC); + if (ret < 0) + req->flags |= REQ_F_FAIL_LINK; io_cqring_add_event(req->ctx, sqe->user_data, ret); io_put_req(req); return 0; @@ -1310,6 +1376,8 @@ static int io_sync_file_range(struct io_kiocb *req, ret = sync_file_range(req->rw.ki_filp, sqe_off, sqe_len, flags); + if (ret < 0) + req->flags |= REQ_F_FAIL_LINK; io_cqring_add_event(req->ctx, sqe->user_data, ret); io_put_req(req); return 0; @@ -1337,6 +1405,8 @@ static int io_sendmsg(struct io_kiocb *req, const struct io_uring_sqe *sqe, ret = __sys_sendmsg_sock(sock, msg, flags); } + if (ret < 0) + req->flags |= REQ_F_FAIL_LINK; io_cqring_add_event(req->ctx, sqe->user_data, ret); io_put_req(req); return 0; @@ -1367,6 +1437,8 @@ static int io_recvmsg(struct io_kiocb *req, const struct io_uring_sqe *sqe, ret = __sys_recvmsg_sock(sock, msg, flags); } + if (ret < 0) + req->flags |= REQ_F_FAIL_LINK; io_cqring_add_event(req->ctx, sqe->user_data, ret); io_put_req(req); return 0; @@ -1622,9 +1694,10 @@ static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, { int ret, opcode; + req->user_data = READ_ONCE(s->sqe->user_data); + if (unlikely(s->index >= ctx->sq_entries)) return -EINVAL; - req->user_data = READ_ONCE(s->sqe->user_data); opcode = READ_ONCE(s->sqe->opcode); switch (opcode) { @@ -1674,7 +1747,7 @@ static int __io_submit_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, return ret; if (ctx->flags & IORING_SETUP_IOPOLL) { - if (req->error == -EAGAIN) + if (req->result == -EAGAIN) return -EAGAIN; /* workqueue context doesn't hold uring_lock, grab it now */ @@ -1900,31 +1973,11 @@ static int io_req_set_file(struct io_ring_ctx *ctx, const struct sqe_submit *s, return 0; } -static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, - struct io_submit_state *state) +static int io_queue_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req, + struct sqe_submit *s) { - struct io_kiocb *req; int ret; - /* enforce forwards compatibility on users */ - if (unlikely(s->sqe->flags & ~(IOSQE_FIXED_FILE | IOSQE_IO_DRAIN))) - return -EINVAL; - - req = io_get_req(ctx, state); - if (unlikely(!req)) - return -EAGAIN; - - ret = io_req_set_file(ctx, s, state, req); - if (unlikely(ret)) - goto out; - - ret = io_req_defer(ctx, req, s->sqe); - if (ret) { - if (ret == -EIOCBQUEUED) - ret = 0; - return ret; - } - ret = __io_submit_sqe(ctx, req, s, true); if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) { struct io_uring_sqe *sqe_copy; @@ -1947,20 +2000,82 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, /* * Queued up for async execution, worker will release - * submit reference when the iocb is actually - * submitted. + * submit reference when the iocb is actually submitted. */ return 0; } } -out: /* drop submission reference */ io_put_req(req); /* and drop final reference, if we failed */ - if (ret) + if (ret) { + io_cqring_add_event(ctx, req->user_data, ret); + if (req->flags & REQ_F_LINK) + req->flags |= REQ_F_FAIL_LINK; io_put_req(req); + } + + return ret; +} + +#define SQE_VALID_FLAGS (IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK) + +static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, + struct io_submit_state *state, struct io_kiocb **link) +{ + struct io_uring_sqe *sqe_copy; + struct io_kiocb *req; + int ret; + + /* enforce forwards compatibility on users */ + if (unlikely(s->sqe->flags & ~SQE_VALID_FLAGS)) + return -EINVAL; + + req = io_get_req(ctx, state); + if (unlikely(!req)) + return -EAGAIN; + + ret = io_req_set_file(ctx, s, state, req); + if (unlikely(ret)) { + io_free_req(req); + return ret; + } + + ret = io_req_defer(ctx, req, s->sqe); + if (ret) { + if (ret == -EIOCBQUEUED) + ret = 0; + return ret; + } + + /* + * If we already have a head request, queue this one for async + * submittal once the head completes. If we don't have a head but + * IOSQE_IO_LINK is set in the sqe, start a new head. This one will be + * submitted sync once the chain is complete. If none of those + * conditions are true (normal request), then just queue it. + */ + if (*link) { + struct io_kiocb *prev = *link; + + sqe_copy = kmemdup(s->sqe, sizeof(*sqe_copy), GFP_KERNEL); + if (!sqe_copy) + return -EAGAIN; + + s->sqe = sqe_copy; + memcpy(&req->submit, s, sizeof(*s)); + list_add_tail(&req->list, &prev->link_list); + } else if (s->sqe->flags & IOSQE_IO_LINK) { + req->flags |= REQ_F_LINK; + + memcpy(&req->submit, s, sizeof(*s)); + INIT_LIST_HEAD(&req->link_list); + *link = req; + } else { + ret = io_queue_sqe(ctx, req, s); + } return ret; } @@ -2047,6 +2162,8 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes, unsigned int nr, bool has_user, bool mm_fault) { struct io_submit_state state, *statep = NULL; + struct io_kiocb *link = NULL; + bool prev_was_link = false; int ret, i, submitted = 0; if (nr > IO_PLUG_THRESHOLD) { @@ -2055,22 +2172,28 @@ static int io_submit_sqes(struct io_ring_ctx *ctx, struct sqe_submit *sqes, } for (i = 0; i < nr; i++) { + if (!prev_was_link && link) { + io_queue_sqe(ctx, link, &link->submit); + link = NULL; + } + prev_was_link = (sqes[i].sqe->flags & IOSQE_IO_LINK) != 0; + if (unlikely(mm_fault)) { ret = -EFAULT; } else { sqes[i].has_user = has_user; sqes[i].needs_lock = true; sqes[i].needs_fixed_file = true; - ret = io_submit_sqe(ctx, &sqes[i], statep); + ret = io_submit_sqe(ctx, &sqes[i], statep, &link); } if (!ret) { submitted++; continue; } - - io_cqring_add_event(ctx, sqes[i].sqe->user_data, ret); } + if (link) + io_queue_sqe(ctx, link, &link->submit); if (statep) io_submit_state_end(&state); @@ -2211,6 +2334,8 @@ static int io_sq_thread(void *data) static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit) { struct io_submit_state state, *statep = NULL; + struct io_kiocb *link = NULL; + bool prev_was_link = false; int i, submit = 0; if (to_submit > IO_PLUG_THRESHOLD) { @@ -2225,17 +2350,23 @@ static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit) if (!io_get_sqring(ctx, &s)) break; + if (!prev_was_link && link) { + io_queue_sqe(ctx, link, &link->submit); + link = NULL; + } + prev_was_link = (s.sqe->flags & IOSQE_IO_LINK) != 0; + s.has_user = true; s.needs_lock = false; s.needs_fixed_file = false; submit++; - ret = io_submit_sqe(ctx, &s, statep); - if (ret) - io_cqring_add_event(ctx, s.sqe->user_data, ret); + ret = io_submit_sqe(ctx, &s, statep, &link); } io_commit_sqring(ctx); + if (link) + io_queue_sqe(ctx, link, &link->submit); if (statep) io_submit_state_end(statep); diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index c5ea202684e3..1e1652f25cc1 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -41,6 +41,7 @@ struct io_uring_sqe { */ #define IOSQE_FIXED_FILE (1U << 0) /* use fixed fileset */ #define IOSQE_IO_DRAIN (1U << 1) /* issue after inflight IO */ +#define IOSQE_IO_LINK (1U << 2) /* links next sqe */ /* * io_uring_setup() flags