From patchwork Thu Jan 10 02:43:58 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 10755111 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9B1A917E1 for ; Thu, 10 Jan 2019 02:44:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C34B29442 for ; Thu, 10 Jan 2019 02:44:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 88ECF28720; Thu, 10 Jan 2019 02:44:42 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI autolearn=unavailable version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A84C929366 for ; Thu, 10 Jan 2019 02:44:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727135AbfAJCog (ORCPT ); Wed, 9 Jan 2019 21:44:36 -0500 Received: from mail-pf1-f195.google.com ([209.85.210.195]:38124 "EHLO mail-pf1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727130AbfAJCog (ORCPT ); Wed, 9 Jan 2019 21:44:36 -0500 Received: by mail-pf1-f195.google.com with SMTP id q1so4606868pfi.5 for ; Wed, 09 Jan 2019 18:44:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=ynwH7XRy8gBtL+3OT2w/cCPQ5tVO3FYfbGzVjLacAOw=; b=ZQ64FMEGW0thqprjc6T1NhubWtGBpm7OTy+hETbVaDXg/9z7OxUY0IDaPSsV0axyLa sEjvys1SLKngYNDPjAw8SwkbEGVqTqzku17O7nJpoMLnvtXhuPMbdQxs24ta8oxHlnn3 r1VemOVFhS60ypCBeY5Rlji9+G5L40O6gyz8I47pnT8q/7SqUGzEIQvhlko5OfBr2hz1 NcJ65k320F2eQf6OWJAdQVvDXgp2lMfFbIXBccTZLfjAUwvGSabhQB2kowsUS+XdlAzj sasS92q4h1bcHpNKY1uQUNgLGmMbS6/QyfMwWX3COrWYOs2/GdoJcJxNqyEd18Lj0BxZ t5Lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=ynwH7XRy8gBtL+3OT2w/cCPQ5tVO3FYfbGzVjLacAOw=; b=plwR4AQu0uTgjpiyLhxpAMKDLNXzF4qprkK4JYNIr7ij8gFamOUZaQBOBC8UMYCgZ8 1HnYv0Yy2Nf6Kxr7ra4fod1RY1eHS8nkIOcvkC4vziL6clRY2a79k8F/eo4M/QitZYqC 7QVJQuOPEEc7is0CFtCJy5DldH1a966ajOUyawO3ObNa+JMTtqeVuCWZUNF5YkONWiXh zQsxU5yGEQrGSkktTZetis2s4daYtOeu4KFZaNGjwNY0u7g6L4r7ERkWVr9+iNuDXrwT WnoyQsoINKLJuC+SzWuAOu8p8Fx81ZAf94rYShkOTArhaX+7UB+UNFWayeeBSN2S9OZ8 Kanw== X-Gm-Message-State: AJcUukcJ2ISnyxEI//HNIiip+AdWbDbtT+VHh4snm5896GAv0q8VmUas ehAC8MozqnE9jyUlr8gyy7oLXtYOavZDyQ== X-Google-Smtp-Source: ALg8bN63BZjPgIQQE0KUX471EAO9mF5jqPnMD39l4uMiX6kmUtsH2au9z7K1NeAKhKGSuBg/Tjgl2g== X-Received: by 2002:a65:4ccb:: with SMTP id n11mr7872901pgt.257.1547088274894; Wed, 09 Jan 2019 18:44:34 -0800 (PST) Received: from x1.localdomain (66.29.188.166.static.utbb.net. [66.29.188.166]) by smtp.gmail.com with ESMTPSA id v15sm105799631pfn.94.2019.01.09.18.44.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 09 Jan 2019 18:44:34 -0800 (PST) From: Jens Axboe To: linux-fsdevel@vger.kernel.org, linux-aio@kvack.org, linux-block@vger.kernel.org, linux-arch@vger.kernel.org Cc: hch@lst.de, jmoyer@redhat.com, avi@scylladb.com, Jens Axboe Subject: [PATCH 09/15] io_uring: use fget/fput_many() for file references Date: Wed, 9 Jan 2019 19:43:58 -0700 Message-Id: <20190110024404.25372-10-axboe@kernel.dk> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20190110024404.25372-1-axboe@kernel.dk> References: <20190110024404.25372-1-axboe@kernel.dk> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On the submission side, add file reference batching to the io_submit_state. We get as many references as the number of iocbs we are submitting, and drop unused ones if we end up switching files. The assumption here is that we're usually only dealing with one fd, and if there are multiple, hopefuly they are at least somewhat ordered. Could trivially be extended to cover multiple fds, if needed. On the completion side we do the same thing, except this is trivially done just locally in io_iopoll_reap(). Signed-off-by: Jens Axboe --- fs/io_uring.c | 105 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 92 insertions(+), 13 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index f7938156552f..cd2dfc153338 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -126,6 +126,15 @@ struct io_submit_state { */ struct list_head req_list; unsigned int req_count; + + /* + * File reference cache + */ + struct file *file; + unsigned int fd; + unsigned int has_refs; + unsigned int used_refs; + unsigned int ios_left; }; static struct kmem_cache *kiocb_cachep; @@ -234,7 +243,8 @@ static void io_iopoll_reap(struct io_ring_ctx *ctx, unsigned int *nr_events) { void *iocbs[IO_IOPOLL_BATCH]; struct io_kiocb *iocb, *n; - int to_free = 0; + int file_count, to_free = 0; + struct file *file = NULL; list_for_each_entry_safe(iocb, n, &ctx->poll_completing, ki_list) { if (!test_bit(KIOCB_F_IOPOLL_COMPLETED, &iocb->ki_flags)) @@ -245,10 +255,27 @@ static void io_iopoll_reap(struct io_ring_ctx *ctx, unsigned int *nr_events) list_del(&iocb->ki_list); iocbs[to_free++] = iocb; - fput(iocb->rw.ki_filp); + /* + * Batched puts of the same file, to avoid dirtying the + * file usage count multiple times, if avoidable. + */ + if (!file) { + file = iocb->rw.ki_filp; + file_count = 1; + } else if (file == iocb->rw.ki_filp) { + file_count++; + } else { + fput_many(file, file_count); + file = iocb->rw.ki_filp; + file_count = 1; + } + (*nr_events)++; } + if (file) + fput_many(file, file_count); + if (to_free) io_free_kiocb_many(ctx, iocbs, &to_free); } @@ -428,13 +455,60 @@ static void io_complete_scqring_iopoll(struct kiocb *kiocb, long res, long res2) } } -static int io_prep_rw(struct io_kiocb *kiocb, const struct io_uring_sqe *sqe) +static void io_file_put(struct io_submit_state *state, struct file *file) +{ + if (!state) { + fput(file); + } else if (state->file) { + int diff = state->has_refs - state->used_refs; + + if (diff) + fput_many(state->file, diff); + state->file = NULL; + } +} + +/* + * Get as many references to a file as we have IOs left in this submission, + * assuming most submissions are for one file, or at least that each file + * has more than one submission. + */ +static struct file *io_file_get(struct io_submit_state *state, int fd) +{ + if (!state) + return fget(fd); + + if (!state->file) { +get_file: + state->file = fget_many(fd, state->ios_left); + if (!state->file) + return NULL; + + state->fd = fd; + state->has_refs = state->ios_left; + state->used_refs = 1; + state->ios_left--; + return state->file; + } + + if (state->fd == fd) { + state->used_refs++; + state->ios_left--; + return state->file; + } + + io_file_put(state, NULL); + goto get_file; +} + +static int io_prep_rw(struct io_kiocb *kiocb, const struct io_uring_sqe *sqe, + struct io_submit_state *state) { struct io_ring_ctx *ctx = kiocb->ki_ctx; struct kiocb *req = &kiocb->rw; int ret; - req->ki_filp = fget(sqe->fd); + req->ki_filp = io_file_get(state, sqe->fd); if (unlikely(!req->ki_filp)) return -EBADF; req->ki_pos = sqe->off; @@ -470,7 +544,7 @@ static int io_prep_rw(struct io_kiocb *kiocb, const struct io_uring_sqe *sqe) } return 0; out_fput: - fput(req->ki_filp); + io_file_put(state, req->ki_filp); return ret; } @@ -553,7 +627,8 @@ static void io_iopoll_kiocb_issued(struct io_submit_state *state, io_iopoll_iocb_add_state(state, kiocb); } -static ssize_t io_read(struct io_kiocb *kiocb, const struct io_uring_sqe *sqe) +static ssize_t io_read(struct io_kiocb *kiocb, const struct io_uring_sqe *sqe, + struct io_submit_state *state) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; void __user *buf = (void __user *) (uintptr_t) sqe->addr; @@ -562,7 +637,7 @@ static ssize_t io_read(struct io_kiocb *kiocb, const struct io_uring_sqe *sqe) struct file *file; ssize_t ret; - ret = io_prep_rw(kiocb, sqe); + ret = io_prep_rw(kiocb, sqe, state); if (ret) return ret; file = req->ki_filp; @@ -588,7 +663,8 @@ static ssize_t io_read(struct io_kiocb *kiocb, const struct io_uring_sqe *sqe) return ret; } -static ssize_t io_write(struct io_kiocb *kiocb, const struct io_uring_sqe *sqe) +static ssize_t io_write(struct io_kiocb *kiocb, const struct io_uring_sqe *sqe, + struct io_submit_state *state) { struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs; void __user *buf = (void __user *) (uintptr_t) sqe->addr; @@ -597,7 +673,7 @@ static ssize_t io_write(struct io_kiocb *kiocb, const struct io_uring_sqe *sqe) struct file *file; ssize_t ret; - ret = io_prep_rw(kiocb, sqe); + ret = io_prep_rw(kiocb, sqe, state); if (ret) return ret; file = req->ki_filp; @@ -697,10 +773,10 @@ static int io_submit_sqe(struct io_ring_ctx *ctx, struct sqe_submit *s, ret = -EINVAL; switch (sqe->opcode) { case IORING_OP_READV: - ret = io_read(req, sqe); + ret = io_read(req, sqe, state); break; case IORING_OP_WRITEV: - ret = io_write(req, sqe); + ret = io_write(req, sqe, state); break; case IORING_OP_FSYNC: ret = io_fsync(req, sqe, false); @@ -751,17 +827,20 @@ static void io_submit_state_end(struct io_submit_state *state) blk_finish_plug(&state->plug); if (!list_empty(&state->req_list)) io_flush_state_reqs(state->ctx, state); + io_file_put(state, NULL); } /* * Start submission side cache. */ static void io_submit_state_start(struct io_submit_state *state, - struct io_ring_ctx *ctx) + struct io_ring_ctx *ctx, unsigned max_ios) { state->ctx = ctx; INIT_LIST_HEAD(&state->req_list); state->req_count = 0; + state->file = NULL; + state->ios_left = max_ios; #ifdef CONFIG_BLOCK state->plug_cb.callback = io_state_unplug; blk_start_plug(&state->plug); @@ -807,7 +886,7 @@ static int io_ring_submit(struct io_ring_ctx *ctx, unsigned int to_submit) int i, ret = 0, submit = 0; if (to_submit > IO_PLUG_THRESHOLD) { - io_submit_state_start(&state, ctx); + io_submit_state_start(&state, ctx, to_submit); statep = &state; }