From patchwork Tue Feb 18 22:42:25 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13980994 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31024286284 for ; Tue, 18 Feb 2025 22:43:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739918585; cv=none; b=HQXcDSly94SYlJtkjhn5Ah5q7J7B/OgA5v/V6YYpg/G70nYpXPF2iWDLWu5sUuaBMGpfTr8Ah/McGOBfuYg3k7E6EzmD5FsajvkcDm6n5Nh+FdM/+UT/3XBiAGrOJ0K0w15bAczTeJNP00BEpeg1D/i16XDA2sIZSI09zmrHDjQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739918585; c=relaxed/simple; bh=AeavXwN8CQhduLD67hGYrGDLDXyi8wXPI9d83VYuSx4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DG/EkpCa6SisAI3QQjkFUaNs5VySfNybL58Le10+HjM1t6y8vndGwn9F8DK3TqjLrz4g/pyGNEqDYcfwB9dwG3cDnC9MrEfr5b7uDn7bjHJjbylKsIukYBtol0/RTPPZbd7nsjY5xTAQzcGYOGmMKxyP7GurPB0O/LsusXZhVbQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=gkXeTAiR; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="gkXeTAiR" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51IM80hT030431 for ; Tue, 18 Feb 2025 14:43:02 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=BSKYbCE/BupWIdIF0IoSfE5tfwwNFqjd3tZVJ7HYn6I=; b=gkXeTAiRY+Nu MGltEXs+GzSL8TCCNfKWnOwPOGJ3588cF2QmxGaDCCgwg7eTQYfE2+fXIXBb74nd NawIEk+AOHpXWl3zhJue9nFWqlwL2FZuvn1ZC+1u+RDiE5rx5iabv3rjNl/Uc7Uc GX0+AGrb00bVMh1iyB13G2XA2WvT5W0WPGgNzm5+Awyx8GOg8r0rA/BosYnvGtbT FpUs1stIwCRmYCc/b/f0L8FgFncSOInZViuK9bgqcTw02jNdut5vCCoz8DhTJZwD nv5FVecR+WdeoOqf/hpuaDZUY5rwd4CsUql5RD96J4LfepAMSFA6hrf7kD4zwjT7 49B15GDUZA== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 44w01bf90g-2 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Feb 2025 14:43:01 -0800 (PST) Received: from twshared8234.09.ash9.facebook.com (2620:10d:c0a8:1c::11) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 18 Feb 2025 22:42:55 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 6EE76182F61C4; Tue, 18 Feb 2025 14:42:49 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv4 1/5] io_uring: move fixed buffer import to issue path Date: Tue, 18 Feb 2025 14:42:25 -0800 Message-ID: <20250218224229.837848-2-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250218224229.837848-1-kbusch@meta.com> References: <20250218224229.837848-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: jBjz8eeaZqnrCcJ40PYsWtFkOX10ZY8d X-Proofpoint-GUID: jBjz8eeaZqnrCcJ40PYsWtFkOX10ZY8d X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_10,2025-02-18_01,2024-11-22_01 From: Keith Busch Similar to the fixed file path, requests may depend on a previous one to set up an index, so we need to allow linking them. The prep callback happens too soon for linked commands, so the lookup needs to be deferred to the issue path. Change the prep callbacks to just set the buf_index and let generic io_uring code handle the fixed buffer node setup, just like it already does for fixed files. Signed-off-by: Keith Busch Reviewed-by: Ming Lei --- include/linux/io_uring_types.h | 3 +++ io_uring/io_uring.c | 19 ++++++++++++++ io_uring/net.c | 25 ++++++------------- io_uring/nop.c | 22 +++-------------- io_uring/rw.c | 45 ++++++++++++++++++++++++---------- io_uring/uring_cmd.c | 16 ++---------- 6 files changed, 67 insertions(+), 63 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index c0fe8a00fe53a..0bcaefc4ffe02 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -482,6 +482,7 @@ enum { REQ_F_DOUBLE_POLL_BIT, REQ_F_APOLL_MULTISHOT_BIT, REQ_F_CLEAR_POLLIN_BIT, + REQ_F_FIXED_BUFFER_BIT, /* keep async read/write and isreg together and in order */ REQ_F_SUPPORT_NOWAIT_BIT, REQ_F_ISREG_BIT, @@ -574,6 +575,8 @@ enum { REQ_F_BUF_NODE = IO_REQ_FLAG(REQ_F_BUF_NODE_BIT), /* request has read/write metadata assigned */ REQ_F_HAS_METADATA = IO_REQ_FLAG(REQ_F_HAS_METADATA_BIT), + /* request has a fixed buffer at buf_index */ + REQ_F_FIXED_BUFFER = IO_REQ_FLAG(REQ_F_FIXED_BUFFER_BIT), }; typedef void (*io_req_tw_func_t)(struct io_kiocb *req, io_tw_token_t tw); diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 58528bf61638e..7800edbc57279 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1721,6 +1721,23 @@ static bool io_assign_file(struct io_kiocb *req, const struct io_issue_def *def, return !!req->file; } +static bool io_assign_buffer(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_rsrc_node *node; + + if (req->buf_node || !(req->flags & REQ_F_FIXED_BUFFER)) + return true; + + io_ring_submit_lock(ctx, issue_flags); + node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); + if (node) + io_req_assign_buf_node(req, node); + io_ring_submit_unlock(ctx, issue_flags); + + return !!node; +} + static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) { const struct io_issue_def *def = &io_issue_defs[req->opcode]; @@ -1729,6 +1746,8 @@ static int io_issue_sqe(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(!io_assign_file(req, def, issue_flags))) return -EBADF; + if (unlikely(!io_assign_buffer(req, issue_flags))) + return -EFAULT; if (unlikely((req->flags & REQ_F_CREDS) && req->creds != current_cred())) creds = override_creds(req->creds); diff --git a/io_uring/net.c b/io_uring/net.c index 000dc70d08d0d..39838e8575b53 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1373,6 +1373,10 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) #endif if (unlikely(!io_msg_alloc_async(req))) return -ENOMEM; + if (zc->flags & IORING_RECVSEND_FIXED_BUF) { + req->buf_index = zc->buf_index; + req->flags |= REQ_F_FIXED_BUFFER; + } if (req->opcode != IORING_OP_SENDMSG_ZC) return io_send_setup(req, sqe); return io_sendmsg_setup(req, sqe); @@ -1434,25 +1438,10 @@ static int io_send_zc_import(struct io_kiocb *req, unsigned int issue_flags) struct io_async_msghdr *kmsg = req->async_data; int ret; - if (sr->flags & IORING_RECVSEND_FIXED_BUF) { - struct io_ring_ctx *ctx = req->ctx; - struct io_rsrc_node *node; - - ret = -EFAULT; - io_ring_submit_lock(ctx, issue_flags); - node = io_rsrc_node_lookup(&ctx->buf_table, sr->buf_index); - if (node) { - io_req_assign_buf_node(sr->notif, node); - ret = 0; - } - io_ring_submit_unlock(ctx, issue_flags); - - if (unlikely(ret)) - return ret; - + if (req->flags & REQ_F_FIXED_BUFFER) { ret = io_import_fixed(ITER_SOURCE, &kmsg->msg.msg_iter, - node->buf, (u64)(uintptr_t)sr->buf, - sr->len); + req->buf_node->buf, + (u64)(uintptr_t)sr->buf, sr->len); if (unlikely(ret)) return ret; kmsg->msg.sg_from_iter = io_sg_from_iter; diff --git a/io_uring/nop.c b/io_uring/nop.c index 5e5196df650a1..989908603112f 100644 --- a/io_uring/nop.c +++ b/io_uring/nop.c @@ -16,7 +16,6 @@ struct io_nop { struct file *file; int result; int fd; - int buffer; unsigned int flags; }; @@ -39,10 +38,10 @@ int io_nop_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) nop->fd = READ_ONCE(sqe->fd); else nop->fd = -1; - if (nop->flags & IORING_NOP_FIXED_BUFFER) - nop->buffer = READ_ONCE(sqe->buf_index); - else - nop->buffer = -1; + if (nop->flags & IORING_NOP_FIXED_BUFFER) { + req->buf_index = READ_ONCE(sqe->buf_index); + req->flags |= REQ_F_FIXED_BUFFER; + } return 0; } @@ -63,19 +62,6 @@ int io_nop(struct io_kiocb *req, unsigned int issue_flags) goto done; } } - if (nop->flags & IORING_NOP_FIXED_BUFFER) { - struct io_ring_ctx *ctx = req->ctx; - struct io_rsrc_node *node; - - ret = -EFAULT; - io_ring_submit_lock(ctx, issue_flags); - node = io_rsrc_node_lookup(&ctx->buf_table, nop->buffer); - if (node) { - io_req_assign_buf_node(req, node); - ret = 0; - } - io_ring_submit_unlock(ctx, issue_flags); - } done: if (ret < 0) req_set_fail(req); diff --git a/io_uring/rw.c b/io_uring/rw.c index 16f12f94943f7..2d8910d9197a0 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -353,25 +353,14 @@ int io_prep_writev(struct io_kiocb *req, const struct io_uring_sqe *sqe) static int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe, int ddir) { - struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); - struct io_ring_ctx *ctx = req->ctx; - struct io_rsrc_node *node; - struct io_async_rw *io; int ret; ret = io_prep_rw(req, sqe, ddir, false); if (unlikely(ret)) return ret; - node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); - if (!node) - return -EFAULT; - io_req_assign_buf_node(req, node); - - io = req->async_data; - ret = io_import_fixed(ddir, &io->iter, node->buf, rw->addr, rw->len); - iov_iter_save_state(&io->iter, &io->iter_state); - return ret; + req->flags |= REQ_F_FIXED_BUFFER; + return 0; } int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe) @@ -954,10 +943,36 @@ static int __io_read(struct io_kiocb *req, unsigned int issue_flags) return ret; } +static int io_import_fixed_buffer(struct io_kiocb *req, int ddir) +{ + struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); + struct io_async_rw *io; + int ret; + + if (!(req->flags & REQ_F_FIXED_BUFFER)) + return 0; + + io = req->async_data; + if (io->bytes_done) + return 0; + + ret = io_import_fixed(ddir, &io->iter, req->buf_node->buf, rw->addr, + rw->len); + if (ret) + return ret; + + iov_iter_save_state(&io->iter, &io->iter_state); + return 0; +} + int io_read(struct io_kiocb *req, unsigned int issue_flags) { int ret; + ret = io_import_fixed_buffer(req, READ); + if (unlikely(ret)) + return ret; + ret = __io_read(req, issue_flags); if (ret >= 0) return kiocb_done(req, ret, issue_flags); @@ -1062,6 +1077,10 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) ssize_t ret, ret2; loff_t *ppos; + ret = io_import_fixed_buffer(req, WRITE); + if (unlikely(ret)) + return ret; + ret = io_rw_init_file(req, FMODE_WRITE, WRITE); if (unlikely(ret)) return ret; diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 8bdf2c9b3fef9..112b49fde23e5 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -200,19 +200,8 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) return -EINVAL; if (ioucmd->flags & IORING_URING_CMD_FIXED) { - struct io_ring_ctx *ctx = req->ctx; - struct io_rsrc_node *node; - u16 index = READ_ONCE(sqe->buf_index); - - node = io_rsrc_node_lookup(&ctx->buf_table, index); - if (unlikely(!node)) - return -EFAULT; - /* - * Pi node upfront, prior to io_uring_cmd_import_fixed() - * being called. This prevents destruction of the mapped buffer - * we'll need at actual import time. - */ - io_req_assign_buf_node(req, node); + req->buf_index = READ_ONCE(sqe->buf_index); + req->flags |= REQ_F_FIXED_BUFFER; } ioucmd->cmd_op = READ_ONCE(sqe->cmd_op); @@ -262,7 +251,6 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); struct io_rsrc_node *node = req->buf_node; - /* Must have had rsrc_node assigned at prep time */ if (node) return io_import_fixed(rw, iter, node->buf, ubuf, len); From patchwork Tue Feb 18 22:42:26 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13980993 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C5DB3286284 for ; Tue, 18 Feb 2025 22:42:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739918578; cv=none; b=VKjTl9NNM9ybCUNLRm7laNd/GHmGDLNHZJcz/+7F3BvOYIWwJwT5+9wk/j7hGjBYnykniSS3uFqMFOqO25HzVvcXR6BffKfm6MFfEB/a4XkFoPeQuNoLroclHf2gGOgmkRNFbbbCbie1FN7hV8a/Z1KY4jH8Zu5YjAPnk8lbsOI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739918578; c=relaxed/simple; bh=jTjy7LPNvqFdALtiiffxQqcnndBUqhfFd1xPCKV5Mfk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=ZKa3/ano767Q/5XOb7Vqjniq3vLkzxRt0XMxE9T+i7guQw8HVuNz7C0xm6JT9Wb3ekNTFwi0OiSpIwL440i6tXM3gvhjuUu0HozUUqzGUEKTrR4UvBCX9k/vfB7VHAf3SJnBGrEmqJrkBy9z/Wh0eK/jiqb+emJtjpdrLvl1Di4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=WAx+wGiS; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="WAx+wGiS" Received: from pps.filterd (m0001303.ppops.net [127.0.0.1]) by m0001303.ppops.net (8.18.1.2/8.18.1.2) with ESMTP id 51IM833k012484 for ; Tue, 18 Feb 2025 14:42:53 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=YdhinO9MgygS1g0RgVUmEgkJ4y2AQsblBI+d5/g87Q0=; b=WAx+wGiSoEHT OLvPPdEDbJnOBpY0Mp6A1x/CLDX3Rsq1cRmnHkbYaIoexaWXXphckYMmNq8AIx4T edGQW2OrYDsn0sKdW5I7KxHislh3vNXokQ+pXh2pDVX2lYHBIF392AGeDujDZZZ5 WIiNetsQTMvYEV+qJB6LhyZbrKAwnT5Wv3GGJoBnjY8cBvr9rWtmVcEedBdfN5ps ZXoRU0ZMU6Tqb7x/RqenddWgqRsGouVa/uGwh4IvgOokqf0Jy+gqI8SkVmC7QRGc wPL4VqTu/Zh2ohisMHgU8Y1rRBukv5KurVMo3FaDXOTboALlD3zix61IR2DKtNb5 Dgfy326SVA== Received: from maileast.thefacebook.com ([163.114.135.16]) by m0001303.ppops.net (PPS) with ESMTPS id 44w018fc2g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Feb 2025 14:42:53 -0800 (PST) Received: from twshared24170.03.ash8.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 18 Feb 2025 22:42:47 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 7505C182F61C6; Tue, 18 Feb 2025 14:42:49 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv4 2/5] io_uring: add support for kernel registered bvecs Date: Tue, 18 Feb 2025 14:42:26 -0800 Message-ID: <20250218224229.837848-3-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250218224229.837848-1-kbusch@meta.com> References: <20250218224229.837848-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 3hurAnMEaPJ8tMB6_5r21-p-8J3SzzHI X-Proofpoint-ORIG-GUID: 3hurAnMEaPJ8tMB6_5r21-p-8J3SzzHI X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_10,2025-02-18_01,2024-11-22_01 From: Keith Busch Provide an interface for the kernel to leverage the existing pre-registered buffers that io_uring provides. User space can reference these later to achieve zero-copy IO. User space must register an empty fixed buffer table with io_uring in order for the kernel to make use of it. Signed-off-by: Keith Busch --- include/linux/io_uring.h | 1 + include/linux/io_uring_types.h | 6 ++ io_uring/rsrc.c | 115 ++++++++++++++++++++++++++++++--- io_uring/rsrc.h | 4 ++ 4 files changed, 118 insertions(+), 8 deletions(-) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 85fe4e6b275c7..b5637a2aae340 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -5,6 +5,7 @@ #include #include #include +#include #if defined(CONFIG_IO_URING) void __io_uring_cancel(bool cancel_all); diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 0bcaefc4ffe02..2aed51e8c79ee 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -709,4 +709,10 @@ static inline bool io_ctx_cqe32(struct io_ring_ctx *ctx) return ctx->flags & IORING_SETUP_CQE32; } +int io_buffer_register_bvec(struct io_ring_ctx *ctx, struct request *rq, + void (*release)(void *), unsigned int index, + unsigned int issue_flags); +void io_buffer_unregister_bvec(struct io_ring_ctx *ctx, unsigned int index, + unsigned int issue_flags); + #endif diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 20b884c84e55f..88bcacc77b72e 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -103,19 +103,23 @@ int io_buffer_validate(struct iovec *iov) static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node) { - unsigned int i; + struct io_mapped_ubuf *imu = node->buf; - if (node->buf) { - struct io_mapped_ubuf *imu = node->buf; + if (!refcount_dec_and_test(&imu->refs)) + return; + + if (imu->release) { + imu->release(imu->priv); + } else { + unsigned int i; - if (!refcount_dec_and_test(&imu->refs)) - return; for (i = 0; i < imu->nr_bvecs; i++) unpin_user_page(imu->bvec[i].bv_page); if (imu->acct_pages) io_unaccount_mem(ctx, imu->acct_pages); - kvfree(imu); } + + kvfree(imu); } struct io_rsrc_node *io_rsrc_node_alloc(int type) @@ -764,6 +768,10 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, imu->len = iov->iov_len; imu->nr_bvecs = nr_pages; imu->folio_shift = PAGE_SHIFT; + imu->release = NULL; + imu->priv = NULL; + imu->readable = true; + imu->writeable = true; if (coalesced) imu->folio_shift = data.folio_shift; refcount_set(&imu->refs, 1); @@ -860,6 +868,87 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, return ret; } +int io_buffer_register_bvec(struct io_ring_ctx *ctx, struct request *rq, + void (*release)(void *), unsigned int index, + unsigned int issue_flags) +{ + struct io_rsrc_data *data = &ctx->buf_table; + struct req_iterator rq_iter; + struct io_mapped_ubuf *imu; + struct io_rsrc_node *node; + struct bio_vec bv, *bvec; + int ret = 0; + u16 nr_bvecs; + + io_ring_submit_lock(ctx, issue_flags); + + if (io_rsrc_node_lookup(data, index)) { + ret = -EBUSY; + goto unlock; + } + + node = io_rsrc_node_alloc(IORING_RSRC_BUFFER); + if (!node) { + ret = -ENOMEM; + goto unlock; + } + + nr_bvecs = blk_rq_nr_phys_segments(rq); + imu = kvmalloc(struct_size(imu, bvec, nr_bvecs), GFP_KERNEL); + if (!imu) { + kfree(node); + ret = -ENOMEM; + goto unlock; + } + + imu->ubuf = 0; + imu->len = blk_rq_bytes(rq); + imu->acct_pages = 0; + imu->folio_shift = PAGE_SHIFT; + imu->nr_bvecs = nr_bvecs; + refcount_set(&imu->refs, 1); + imu->release = release; + imu->priv = rq; + + if (rq_data_dir(rq)) + imu->writeable = true; + else + imu->readable = true; + + bvec = imu->bvec; + rq_for_each_bvec(bv, rq, rq_iter) + *bvec++ = bv; + + node->buf = imu; + data->nodes[index] = node; +unlock: + io_ring_submit_unlock(ctx, issue_flags); + return ret; +} +EXPORT_SYMBOL_GPL(io_buffer_register_bvec); + +void io_buffer_unregister_bvec(struct io_ring_ctx *ctx, unsigned int index, + unsigned int issue_flags) +{ + struct io_rsrc_data *data = &ctx->buf_table; + struct io_rsrc_node *node; + + io_ring_submit_lock(ctx, issue_flags); + + if (!data->nr) + goto unlock; + + node = io_rsrc_node_lookup(data, index); + if (!node || !node->buf->release) + goto unlock; + + io_put_rsrc_node(ctx, node); + data->nodes[index] = NULL; +unlock: + io_ring_submit_unlock(ctx, issue_flags); +} +EXPORT_SYMBOL_GPL(io_buffer_unregister_bvec); + int io_import_fixed(int ddir, struct iov_iter *iter, struct io_mapped_ubuf *imu, u64 buf_addr, size_t len) @@ -874,6 +963,9 @@ int io_import_fixed(int ddir, struct iov_iter *iter, /* not inside the mapped region */ if (unlikely(buf_addr < imu->ubuf || buf_end > (imu->ubuf + imu->len))) return -EFAULT; + if ((ddir == READ && !imu->readable) || + (ddir == WRITE && !imu->writeable)) + return -EFAULT; /* * Might not be a start of buffer, set size appropriately @@ -886,8 +978,8 @@ int io_import_fixed(int ddir, struct iov_iter *iter, /* * Don't use iov_iter_advance() here, as it's really slow for * using the latter parts of a big fixed buffer - it iterates - * over each segment manually. We can cheat a bit here, because - * we know that: + * over each segment manually. We can cheat a bit here for user + * registered nodes, because we know that: * * 1) it's a BVEC iter, we set it up * 2) all bvecs are the same in size, except potentially the @@ -901,8 +993,15 @@ int io_import_fixed(int ddir, struct iov_iter *iter, */ const struct bio_vec *bvec = imu->bvec; + /* + * Kernel buffer bvecs, on the other hand, don't necessarily + * have the size property of user registered ones, so we have + * to use the slow iter advance. + */ if (offset < bvec->bv_len) { iter->iov_offset = offset; + } else if (imu->release) { + iov_iter_advance(iter, offset); } else { unsigned long seg_skip; diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index abf86b5b86140..81c31b93d4f7b 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -33,6 +33,10 @@ struct io_mapped_ubuf { unsigned int folio_shift; refcount_t refs; unsigned long acct_pages; + void (*release)(void *); + void *priv; + bool readable; + bool writeable; struct bio_vec bvec[] __counted_by(nr_bvecs); }; From patchwork Tue Feb 18 22:42:27 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13980995 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA820286284 for ; Tue, 18 Feb 2025 22:43:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739918588; cv=none; b=Bp7W/lhTLNxG35VcvhbAJ46ISCjQdraTngjg5tKvh7rSZP/Ou0y9y54O2mnbmyS2E6s83JVJhBRN2fT4+xYKlFG9PRasPKa+l67ctjpOZ/EMERUjwuOXQa/pnravz9AuOnkgvmNEQEwHxATMwjXrqtbO3kZbxhE1K/mrrxG2rIM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739918588; c=relaxed/simple; bh=LFUnZuLYM/0XaJNuFP97k8PmHlhzL4MksDuB9FGHp5Q=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=nUp9Qjif/tPUHo/kDNVuhgUCC176+bwONewL9KnVmIfm3VlglTaCZRyZYc+r7gp9j7FjeDIPlfb7blC+2XVtwGO7/UFa+WqQGlRbRqCetNiOj7OKlO8nk9xJIz0IUHeF53MMq5swEJyANg632W3ZWV/eDmmHPClr1gIbNgAipUw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=NRjY+GGe; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="NRjY+GGe" Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51IM8HKs011953 for ; Tue, 18 Feb 2025 14:43:06 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=OLQAusraB2rWR9riVDeT9YAzzVrMybfkkYvzzWb5EaM=; b=NRjY+GGebF8F PH63ivGDdNoPLSZ/rq+Y3fp/rLtrxNAaAT6FB1ka17qNO0FksXNJzhFnpfPEMhbh BV6/xIH5Pt/VK7114Omqip7LEQuFQ4ZUJgVlSs+lhzejD6/dNXTF4KT/T0Ic8zZ8 Ezk6QX3+lzE7gsrB/nwLCLkqjp65JJJb0o4Gr/eQLzK8rD2KbNcZaQ+hIa9uVkvS 4LVfft/tSJIufDjCHm5ozdDnTXCwGgYEMFRLzkOuRFT4a8Lt2xHMj3ssgfAkHEBU QEPWLlhVwrwfApmJyrgM+iE4TAc2w5OZgHZVaKx3hscpeZTz3kJEfdGm7REXOe+N +0LFxpO7Pw== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 44w01ay6fn-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Feb 2025 14:43:05 -0800 (PST) Received: from twshared40462.17.frc2.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 18 Feb 2025 22:42:58 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 7D234182F61C8; Tue, 18 Feb 2025 14:42:49 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv4 3/5] ublk: zc register/unregister bvec Date: Tue, 18 Feb 2025 14:42:27 -0800 Message-ID: <20250218224229.837848-4-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250218224229.837848-1-kbusch@meta.com> References: <20250218224229.837848-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: RSSbVNVlhN8dqEJ0LXB8OgtYDHHJMX_Z X-Proofpoint-GUID: RSSbVNVlhN8dqEJ0LXB8OgtYDHHJMX_Z X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_10,2025-02-18_01,2024-11-22_01 From: Keith Busch Provide new operations for the user to request mapping an active request to an io uring instance's buf_table. The user has to provide the index it wants to install the buffer. A reference count is taken on the request to ensure it can't be completed while it is active in a ring's buf_table. Signed-off-by: Keith Busch --- drivers/block/ublk_drv.c | 137 +++++++++++++++++++++++++--------- include/uapi/linux/ublk_cmd.h | 4 + 2 files changed, 105 insertions(+), 36 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 529085181f355..0c753176b14e9 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -51,6 +51,9 @@ /* private ioctl command mirror */ #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC) +#define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF) +#define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF) + /* All UBLK_F_* have to be included into UBLK_F_ALL */ #define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \ | UBLK_F_URING_CMD_COMP_IN_TASK \ @@ -201,7 +204,7 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq, int tag); static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub) { - return ub->dev_info.flags & UBLK_F_USER_COPY; + return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY); } static inline bool ublk_dev_is_zoned(const struct ublk_device *ub) @@ -581,7 +584,7 @@ static void ublk_apply_params(struct ublk_device *ub) static inline bool ublk_support_user_copy(const struct ublk_queue *ubq) { - return ubq->flags & UBLK_F_USER_COPY; + return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY); } static inline bool ublk_need_req_ref(const struct ublk_queue *ubq) @@ -1747,6 +1750,96 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd, io_uring_cmd_mark_cancelable(cmd, issue_flags); } +static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, + struct ublk_queue *ubq, int tag, size_t offset) +{ + struct request *req; + + if (!ublk_need_req_ref(ubq)) + return NULL; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req) + return NULL; + + if (!ublk_get_req_ref(ubq, req)) + return NULL; + + if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) + goto fail_put; + + if (!ublk_rq_has_data(req)) + goto fail_put; + + if (offset > blk_rq_bytes(req)) + goto fail_put; + + return req; +fail_put: + ublk_put_req_ref(ubq, req); + return NULL; +} + +static void ublk_io_release(void *priv) +{ + struct request *rq = priv; + struct ublk_queue *ubq = rq->mq_hctx->driver_data; + + ublk_put_req_ref(ubq, rq); +} + +static int ublk_register_io_buf(struct io_uring_cmd *cmd, + struct ublk_queue *ubq, int tag, + const struct ublksrv_io_cmd *ub_cmd, + unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx; + struct ublk_device *ub = cmd->file->private_data; + int index = (int)ub_cmd->addr, ret; + struct ublk_rq_data *data; + struct request *req; + + if (!ub) + return -EPERM; + + req = __ublk_check_and_get_req(ub, ubq, tag, 0); + if (!req) + return -EINVAL; + + data = blk_mq_rq_to_pdu(req); + ret = io_buffer_register_bvec(ctx, req, ublk_io_release, index, + issue_flags); + if (ret) { + ublk_put_req_ref(ubq, req); + return ret; + } + + return 0; +} + +static int ublk_unregister_io_buf(struct io_uring_cmd *cmd, + struct ublk_queue *ubq, int tag, + const struct ublksrv_io_cmd *ub_cmd, + unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx; + struct ublk_device *ub = cmd->file->private_data; + int index = (int)ub_cmd->addr; + struct ublk_rq_data *data; + struct request *req; + + if (!ub) + return -EPERM; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req) + return -EINVAL; + + data = blk_mq_rq_to_pdu(req); + io_buffer_unregister_bvec(ctx, index, issue_flags); + return 0; +} + static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags, const struct ublksrv_io_cmd *ub_cmd) @@ -1798,6 +1891,11 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, ret = -EINVAL; switch (_IOC_NR(cmd_op)) { + case UBLK_IO_REGISTER_IO_BUF: + return ublk_register_io_buf(cmd, ubq, tag, ub_cmd, issue_flags); + case UBLK_IO_UNREGISTER_IO_BUF: + return ublk_unregister_io_buf(cmd, ubq, tag, ub_cmd, + issue_flags); case UBLK_IO_FETCH_REQ: /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */ if (ublk_queue_ready(ubq)) { @@ -1872,36 +1970,6 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, return -EIOCBQUEUED; } -static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, - struct ublk_queue *ubq, int tag, size_t offset) -{ - struct request *req; - - if (!ublk_need_req_ref(ubq)) - return NULL; - - req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); - if (!req) - return NULL; - - if (!ublk_get_req_ref(ubq, req)) - return NULL; - - if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) - goto fail_put; - - if (!ublk_rq_has_data(req)) - goto fail_put; - - if (offset > blk_rq_bytes(req)) - goto fail_put; - - return req; -fail_put: - ublk_put_req_ref(ubq, req); - return NULL; -} - static inline int ublk_ch_uring_cmd_local(struct io_uring_cmd *cmd, unsigned int issue_flags) { @@ -2527,9 +2595,6 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) goto out_free_dev_number; } - /* We are not ready to support zero copy */ - ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY; - ub->dev_info.nr_hw_queues = min_t(unsigned int, ub->dev_info.nr_hw_queues, nr_cpu_ids); ublk_align_max_io_size(ub); @@ -2860,7 +2925,7 @@ static int ublk_ctrl_get_features(struct io_uring_cmd *cmd) { const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe); void __user *argp = (void __user *)(unsigned long)header->addr; - u64 features = UBLK_F_ALL & ~UBLK_F_SUPPORT_ZERO_COPY; + u64 features = UBLK_F_ALL; if (header->len != UBLK_FEATURES_LEN || !header->addr) return -EINVAL; diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index a8bc98bb69fce..74246c926b55f 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -94,6 +94,10 @@ _IOWR('u', UBLK_IO_COMMIT_AND_FETCH_REQ, struct ublksrv_io_cmd) #define UBLK_U_IO_NEED_GET_DATA \ _IOWR('u', UBLK_IO_NEED_GET_DATA, struct ublksrv_io_cmd) +#define UBLK_U_IO_REGISTER_IO_BUF \ + _IOWR('u', 0x23, struct ublksrv_io_cmd) +#define UBLK_U_IO_UNREGISTER_IO_BUF \ + _IOWR('u', 0x24, struct ublksrv_io_cmd) /* only ABORT means that no re-fetch */ #define UBLK_IO_RES_OK 0 From patchwork Tue Feb 18 22:42:28 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13980996 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43C7A1C6FFD for ; Tue, 18 Feb 2025 22:43:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739918588; cv=none; b=kAlOkaRW3j+ToWXHHySATYh/pyiKw4+1dERDcA/h8/hIuoE+ZfeC8kICf3Z8DWBWTVH4fac07+fbK/TCjLeNJd55W5ACSTYcr9n9YEigbSq1+43rsboE30ltaTQi2Zk0V1aJMnD4zPogXJABhNuXBmcpUgJ0zY75BsRK8uf2zqs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739918588; c=relaxed/simple; bh=2fDCd5/VVWgQHRQ8TSiaPeo4wBY7xlSFEgZn2vCwdUE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VkeN9vuihexMJoscvQIgAJ6+clZLBXjx52ZRrLf0PTKuXFesU10dkGC9VUgMlGWvxDp3d9WuJEGtYyljezd1B5+/ej/PT1plDucu0NMFa2T4nlRWAsWXqkzPSsq2yRHBG3lsRJSpUIPQKxtaXGjQW517jtJufThrbryDLjCBMyE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=TWAquuHe; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="TWAquuHe" Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51IM7l7O028836 for ; Tue, 18 Feb 2025 14:43:06 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=A4K78lxkevsjZ4owZctRFdAkdKkTMgXFvQBEt/iV/xs=; b=TWAquuHeITmD YCmezW/EMW6wdrnPBr0z08cuEa6CAT7Rny2QMt1VCJEeIjFkrlLDHS6IPNVHMY6h AHqD6oxDA/popYyL1mzkHt8zxY5Taicg2Nji3chO97eWVDieBE6YH/ObeBhJ160t 9rliyc+ie7jdOMecyBgUK/WUgblWpMQFIsDmfMF7gS1SJBN+xYleCHWiMfE/EqyW nLTTI9rOHESd3VaeR1x0s+yne+2EjMowo/RBjx5KcxjYp5lJgUuXNtGBD6vLtFEL DqevDobMVzss6UYMbBGRzIdyOeBeudYbM0Rz5CRpMLzfS+Si1Gq0IbIhSz1B5uLD 8IBuWrIkfQ== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 44w2qe8v9p-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Feb 2025 14:43:06 -0800 (PST) Received: from twshared9216.15.frc2.facebook.com (2620:10d:c085:208::f) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 18 Feb 2025 22:43:00 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 854A8182F61CA; Tue, 18 Feb 2025 14:42:49 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv4 4/5] io_uring: add abstraction for buf_table rsrc data Date: Tue, 18 Feb 2025 14:42:28 -0800 Message-ID: <20250218224229.837848-5-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250218224229.837848-1-kbusch@meta.com> References: <20250218224229.837848-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: PGgH2KyZ7ohd8Fe-uP_xdgrwdwpKXYvj X-Proofpoint-ORIG-GUID: PGgH2KyZ7ohd8Fe-uP_xdgrwdwpKXYvj X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_10,2025-02-18_01,2024-11-22_01 From: Keith Busch We'll need to add more fields specific to the registered buffers, so make a layer for it now. No functional change in this patch. Signed-off-by: Keith Busch Reviewed-by: Caleb Sander Mateos --- include/linux/io_uring_types.h | 6 ++++- io_uring/fdinfo.c | 8 +++--- io_uring/io_uring.c | 2 +- io_uring/register.c | 2 +- io_uring/rsrc.c | 49 +++++++++++++++++----------------- 5 files changed, 35 insertions(+), 32 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 2aed51e8c79ee..810d1dccd27b1 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -69,6 +69,10 @@ struct io_file_table { unsigned int alloc_hint; }; +struct io_buf_table { + struct io_rsrc_data data; +}; + struct io_hash_bucket { struct hlist_head list; } ____cacheline_aligned_in_smp; @@ -293,7 +297,7 @@ struct io_ring_ctx { struct io_wq_work_list iopoll_list; struct io_file_table file_table; - struct io_rsrc_data buf_table; + struct io_buf_table buf_table; struct io_submit_state submit_state; diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index f60d0a9d505e2..d389c06cbce10 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -217,12 +217,12 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *file) seq_puts(m, "\n"); } } - seq_printf(m, "UserBufs:\t%u\n", ctx->buf_table.nr); - for (i = 0; has_lock && i < ctx->buf_table.nr; i++) { + seq_printf(m, "UserBufs:\t%u\n", ctx->buf_table.data.nr); + for (i = 0; has_lock && i < ctx->buf_table.data.nr; i++) { struct io_mapped_ubuf *buf = NULL; - if (ctx->buf_table.nodes[i]) - buf = ctx->buf_table.nodes[i]->buf; + if (ctx->buf_table.data.nodes[i]) + buf = ctx->buf_table.data.nodes[i]->buf; if (buf) seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf, buf->len); else diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 7800edbc57279..bcc8ee31cc97c 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1730,7 +1730,7 @@ static bool io_assign_buffer(struct io_kiocb *req, unsigned int issue_flags) return true; io_ring_submit_lock(ctx, issue_flags); - node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); + node = io_rsrc_node_lookup(&ctx->buf_table.data, req->buf_index); if (node) io_req_assign_buf_node(req, node); io_ring_submit_unlock(ctx, issue_flags); diff --git a/io_uring/register.c b/io_uring/register.c index cc23a4c205cd4..f15a8d52ad30f 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -926,7 +926,7 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, ret = __io_uring_register(ctx, opcode, arg, nr_args); trace_io_uring_register(ctx, opcode, ctx->file_table.data.nr, - ctx->buf_table.nr, ret); + ctx->buf_table.data.nr, ret); mutex_unlock(&ctx->uring_lock); fput(file); diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 88bcacc77b72e..261b5535f46c6 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -235,9 +235,9 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, __u32 done; int i, err; - if (!ctx->buf_table.nr) + if (!ctx->buf_table.data.nr) return -ENXIO; - if (up->offset + nr_args > ctx->buf_table.nr) + if (up->offset + nr_args > ctx->buf_table.data.nr) return -EINVAL; for (done = 0; done < nr_args; done++) { @@ -269,9 +269,9 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, } node->tag = tag; } - i = array_index_nospec(up->offset + done, ctx->buf_table.nr); - io_reset_rsrc_node(ctx, &ctx->buf_table, i); - ctx->buf_table.nodes[i] = node; + i = array_index_nospec(up->offset + done, ctx->buf_table.data.nr); + io_reset_rsrc_node(ctx, &ctx->buf_table.data, i); + ctx->buf_table.data.nodes[i] = node; if (ctx->compat) user_data += sizeof(struct compat_iovec); else @@ -549,9 +549,9 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, int io_sqe_buffers_unregister(struct io_ring_ctx *ctx) { - if (!ctx->buf_table.nr) + if (!ctx->buf_table.data.nr) return -ENXIO; - io_rsrc_data_free(ctx, &ctx->buf_table); + io_rsrc_data_free(ctx, &ctx->buf_table.data); return 0; } @@ -578,8 +578,8 @@ static bool headpage_already_acct(struct io_ring_ctx *ctx, struct page **pages, } /* check previously registered pages */ - for (i = 0; i < ctx->buf_table.nr; i++) { - struct io_rsrc_node *node = ctx->buf_table.nodes[i]; + for (i = 0; i < ctx->buf_table.data.nr; i++) { + struct io_rsrc_node *node = ctx->buf_table.data.nodes[i]; struct io_mapped_ubuf *imu; if (!node) @@ -809,7 +809,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, BUILD_BUG_ON(IORING_MAX_REG_BUFFERS >= (1u << 16)); - if (ctx->buf_table.nr) + if (ctx->buf_table.data.nr) return -EBUSY; if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS) return -EINVAL; @@ -862,7 +862,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, data.nodes[i] = node; } - ctx->buf_table = data; + ctx->buf_table.data = data; if (ret) io_sqe_buffers_unregister(ctx); return ret; @@ -872,7 +872,7 @@ int io_buffer_register_bvec(struct io_ring_ctx *ctx, struct request *rq, void (*release)(void *), unsigned int index, unsigned int issue_flags) { - struct io_rsrc_data *data = &ctx->buf_table; + struct io_rsrc_data *data = &ctx->buf_table.data; struct req_iterator rq_iter; struct io_mapped_ubuf *imu; struct io_rsrc_node *node; @@ -930,7 +930,7 @@ EXPORT_SYMBOL_GPL(io_buffer_register_bvec); void io_buffer_unregister_bvec(struct io_ring_ctx *ctx, unsigned int index, unsigned int issue_flags) { - struct io_rsrc_data *data = &ctx->buf_table; + struct io_rsrc_data *data = &ctx->buf_table.data; struct io_rsrc_node *node; io_ring_submit_lock(ctx, issue_flags); @@ -1049,10 +1049,10 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (!arg->nr && (arg->dst_off || arg->src_off)) return -EINVAL; /* not allowed unless REPLACE is set */ - if (ctx->buf_table.nr && !(arg->flags & IORING_REGISTER_DST_REPLACE)) + if (ctx->buf_table.data.nr && !(arg->flags & IORING_REGISTER_DST_REPLACE)) return -EBUSY; - nbufs = src_ctx->buf_table.nr; + nbufs = src_ctx->buf_table.data.nr; if (!arg->nr) arg->nr = nbufs; else if (arg->nr > nbufs) @@ -1062,13 +1062,13 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (check_add_overflow(arg->nr, arg->dst_off, &nbufs)) return -EOVERFLOW; - ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.nr)); + ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.data.nr)); if (ret) return ret; /* Fill entries in data from dst that won't overlap with src */ - for (i = 0; i < min(arg->dst_off, ctx->buf_table.nr); i++) { - struct io_rsrc_node *src_node = ctx->buf_table.nodes[i]; + for (i = 0; i < min(arg->dst_off, ctx->buf_table.data.nr); i++) { + struct io_rsrc_node *src_node = ctx->buf_table.data.nodes[i]; if (src_node) { data.nodes[i] = src_node; @@ -1077,7 +1077,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx } ret = -ENXIO; - nbufs = src_ctx->buf_table.nr; + nbufs = src_ctx->buf_table.data.nr; if (!nbufs) goto out_free; ret = -EINVAL; @@ -1097,7 +1097,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx while (nr--) { struct io_rsrc_node *dst_node, *src_node; - src_node = io_rsrc_node_lookup(&src_ctx->buf_table, i); + src_node = io_rsrc_node_lookup(&src_ctx->buf_table.data, i); if (!src_node) { dst_node = NULL; } else { @@ -1119,7 +1119,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx * old and new nodes at this point. */ if (arg->flags & IORING_REGISTER_DST_REPLACE) - io_rsrc_data_free(ctx, &ctx->buf_table); + io_sqe_buffers_unregister(ctx); /* * ctx->buf_table must be empty now - either the contents are being @@ -1127,10 +1127,9 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx * copied to a ring that does not have buffers yet (checked at function * entry). */ - WARN_ON_ONCE(ctx->buf_table.nr); - ctx->buf_table = data; + WARN_ON_ONCE(ctx->buf_table.data.nr); + ctx->buf_table.data = data; return 0; - out_free: io_rsrc_data_free(ctx, &data); return ret; @@ -1155,7 +1154,7 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg) return -EFAULT; if (buf.flags & ~(IORING_REGISTER_SRC_REGISTERED|IORING_REGISTER_DST_REPLACE)) return -EINVAL; - if (!(buf.flags & IORING_REGISTER_DST_REPLACE) && ctx->buf_table.nr) + if (!(buf.flags & IORING_REGISTER_DST_REPLACE) && ctx->buf_table.data.nr) return -EBUSY; if (memchr_inv(buf.pad, 0, sizeof(buf.pad))) return -EINVAL; From patchwork Tue Feb 18 22:42:29 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13980997 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7DD51B6CEC for ; Tue, 18 Feb 2025 22:43:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739918593; cv=none; b=J9v6dIeOmdlTTNG9mMTuBYMRfCThp0FWElB1rhPUJRMaRb0gFrl40yuHjh/kjp8OmcKeOcvp1rTOOo9CSLY7mmNLCQxsFVidMN2S6dPZvqod+yY9QCrL9g6d0dNqQYCOOrKqIW645NolM80/Ql4JpIgXYxjaI/WQz3I5GwllCVw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739918593; c=relaxed/simple; bh=S5Ho4yCZr9ofDr5vtZpZvV8UxIoAwENJuEoPgC+K9Ps=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Fbstz7a4i+3pvMjfl4u7eD+cP5pc7CE4/mfTzJ8y7VoXqs+acMZiK2XoeMcaNT9odqyc1TfGMqbsdKaZZR5r8PvqjGYXED0P1xPFmefKUUeRX8VQehst/l8NTtdL1r5lhCTCPPEuO8UhSEVMZq6o0Z3KWsgQlws7J3e7tV+aE/Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=eMscXfoF; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="eMscXfoF" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51IM80hj030431 for ; Tue, 18 Feb 2025 14:43:10 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=MXmF87BHDpeKL6HC7Fl0hfDRRpr6Ju+tFiAcv3THgfc=; b=eMscXfoFLTrX 01wzVUVjPmwMhhikDehZV/pc7INR53Lwu5a3UWGyH6/g1Pfryeq3ldyHwZiRuPuP PcYky+Lc/4k7YmuecKIIhxM1RgzsB4wEoqePaqClX0+CH6ZGrW4ct3j3uZPYIqoY cVTrNQED/+T6xZEV0ptqsRKk7d+Zo0zzS3Jaacy7HN5kojzS8hnh/w1Qm1ERKAyx ddyssBhqaCas1TtussATMqKVqYgnQpVE38orGMshpdhS5jJXHhRuteanmrLr2MXV vcRJHlHIlkHuDQLBJQ21/RmI3sGj/L5RKYflkBGrb9QBlCf6lENgoJKqCUbfkQ50 kPCLXUEX+g== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 44w01bf90g-17 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Tue, 18 Feb 2025 14:43:10 -0800 (PST) Received: from twshared40462.17.frc2.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 18 Feb 2025 22:42:58 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 97931182F61CD; Tue, 18 Feb 2025 14:42:49 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv4 5/5] io_uring: cache nodes and mapped buffers Date: Tue, 18 Feb 2025 14:42:29 -0800 Message-ID: <20250218224229.837848-6-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250218224229.837848-1-kbusch@meta.com> References: <20250218224229.837848-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: Ls512UV8nK7ib1t-ctEHdsa6WQVb4f0- X-Proofpoint-GUID: Ls512UV8nK7ib1t-ctEHdsa6WQVb4f0- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-18_10,2025-02-18_01,2024-11-22_01 From: Keith Busch Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers. Signed-off-by: Keith Busch --- include/linux/io_uring_types.h | 18 ++--- io_uring/filetable.c | 2 +- io_uring/rsrc.c | 120 +++++++++++++++++++++++++-------- io_uring/rsrc.h | 2 +- 4 files changed, 104 insertions(+), 38 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 810d1dccd27b1..bbfb651506522 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -69,8 +69,18 @@ struct io_file_table { unsigned int alloc_hint; }; +struct io_alloc_cache { + void **entries; + unsigned int nr_cached; + unsigned int max_cached; + size_t elem_size; + unsigned int init_clear; +}; + struct io_buf_table { struct io_rsrc_data data; + struct io_alloc_cache node_cache; + struct io_alloc_cache imu_cache; }; struct io_hash_bucket { @@ -224,14 +234,6 @@ struct io_submit_state { struct blk_plug plug; }; -struct io_alloc_cache { - void **entries; - unsigned int nr_cached; - unsigned int max_cached; - unsigned int elem_size; - unsigned int init_clear; -}; - struct io_ring_ctx { /* const or read-mostly hot data */ struct { diff --git a/io_uring/filetable.c b/io_uring/filetable.c index dd8eeec97acf6..a21660e3145ab 100644 --- a/io_uring/filetable.c +++ b/io_uring/filetable.c @@ -68,7 +68,7 @@ static int io_install_fixed_file(struct io_ring_ctx *ctx, struct file *file, if (slot_index >= ctx->file_table.data.nr) return -EINVAL; - node = io_rsrc_node_alloc(IORING_RSRC_FILE); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); if (!node) return -ENOMEM; diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 261b5535f46c6..d5cac3a234316 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -32,6 +32,8 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, #define IORING_MAX_FIXED_FILES (1U << 20) #define IORING_MAX_REG_BUFFERS (1U << 14) +#define IO_CACHED_BVECS_SEGS 32 + int __io_account_mem(struct user_struct *user, unsigned long nr_pages) { unsigned long page_limit, cur_pages, new_pages; @@ -101,6 +103,22 @@ int io_buffer_validate(struct iovec *iov) return 0; } +static struct io_mapped_ubuf *io_alloc_imu(struct io_ring_ctx *ctx, + int nr_bvecs) +{ + if (nr_bvecs <= IO_CACHED_BVECS_SEGS) + return io_cache_alloc(&ctx->buf_table.imu_cache, GFP_KERNEL); + return kvmalloc(struct_size_t(struct io_mapped_ubuf, bvec, nr_bvecs), + GFP_KERNEL); +} + +static void io_free_imu(struct io_ring_ctx *ctx, struct io_mapped_ubuf *imu) +{ + if (imu->nr_bvecs > IO_CACHED_BVECS_SEGS || + !io_alloc_cache_put(&ctx->buf_table.imu_cache, imu)) + kvfree(imu); +} + static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node) { struct io_mapped_ubuf *imu = node->buf; @@ -119,22 +137,35 @@ static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node) io_unaccount_mem(ctx, imu->acct_pages); } - kvfree(imu); + io_free_imu(ctx, imu); } -struct io_rsrc_node *io_rsrc_node_alloc(int type) +struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type) { struct io_rsrc_node *node; - node = kzalloc(sizeof(*node), GFP_KERNEL); + if (type == IORING_RSRC_FILE) + node = kmalloc(sizeof(*node), GFP_KERNEL); + else + node = io_cache_alloc(&ctx->buf_table.node_cache, GFP_KERNEL); if (node) { node->type = type; node->refs = 1; + node->tag = 0; + node->file_ptr = 0; } return node; } -__cold void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data) +static __cold void __io_rsrc_data_free(struct io_rsrc_data *data) +{ + kvfree(data->nodes); + data->nodes = NULL; + data->nr = 0; +} + +__cold void io_rsrc_data_free(struct io_ring_ctx *ctx, + struct io_rsrc_data *data) { if (!data->nr) return; @@ -142,9 +173,7 @@ __cold void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data if (data->nodes[data->nr]) io_put_rsrc_node(ctx, data->nodes[data->nr]); } - kvfree(data->nodes); - data->nodes = NULL; - data->nr = 0; + __io_rsrc_data_free(data); } __cold int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr) @@ -158,6 +187,31 @@ __cold int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr) return -ENOMEM; } +static __cold int io_rsrc_buffer_alloc(struct io_buf_table *table, unsigned nr) +{ + const int imu_cache_size = struct_size_t(struct io_mapped_ubuf, bvec, + IO_CACHED_BVECS_SEGS); + const int node_size = sizeof(struct io_rsrc_node); + int ret; + + ret = io_rsrc_data_alloc(&table->data, nr); + if (ret) + return ret; + + if (io_alloc_cache_init(&table->node_cache, nr, node_size, 0)) + goto free_data; + + if (io_alloc_cache_init(&table->imu_cache, nr, imu_cache_size, 0)) + goto free_cache; + + return 0; +free_cache: + io_alloc_cache_free(&table->node_cache, kfree); +free_data: + __io_rsrc_data_free(&table->data); + return -ENOMEM; +} + static int __io_sqe_files_update(struct io_ring_ctx *ctx, struct io_uring_rsrc_update2 *up, unsigned nr_args) @@ -207,7 +261,7 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx, err = -EBADF; break; } - node = io_rsrc_node_alloc(IORING_RSRC_FILE); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); if (!node) { err = -ENOMEM; fput(file); @@ -459,6 +513,8 @@ void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node) case IORING_RSRC_BUFFER: if (node->buf) io_buffer_unmap(ctx, node); + if (io_alloc_cache_put(&ctx->buf_table.node_cache, node)) + return; break; default: WARN_ON_ONCE(1); @@ -527,7 +583,7 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, goto fail; } ret = -ENOMEM; - node = io_rsrc_node_alloc(IORING_RSRC_FILE); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); if (!node) { fput(file); goto fail; @@ -547,11 +603,19 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, return ret; } +static void io_rsrc_buffer_free(struct io_ring_ctx *ctx, + struct io_buf_table *table) +{ + io_rsrc_data_free(ctx, &table->data); + io_alloc_cache_free(&table->node_cache, kfree); + io_alloc_cache_free(&table->imu_cache, kfree); +} + int io_sqe_buffers_unregister(struct io_ring_ctx *ctx) { if (!ctx->buf_table.data.nr) return -ENXIO; - io_rsrc_data_free(ctx, &ctx->buf_table.data); + io_rsrc_buffer_free(ctx, &ctx->buf_table); return 0; } @@ -732,7 +796,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, if (!iov->iov_base) return NULL; - node = io_rsrc_node_alloc(IORING_RSRC_BUFFER); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER); if (!node) return ERR_PTR(-ENOMEM); node->buf = NULL; @@ -752,7 +816,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, coalesced = io_coalesce_buffer(&pages, &nr_pages, &data); } - imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL); + imu = io_alloc_imu(ctx, nr_pages); if (!imu) goto done; @@ -789,7 +853,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, } done: if (ret) { - kvfree(imu); + io_free_imu(ctx, imu); if (node) io_put_rsrc_node(ctx, node); node = ERR_PTR(ret); @@ -802,9 +866,9 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, unsigned int nr_args, u64 __user *tags) { struct page *last_hpage = NULL; - struct io_rsrc_data data; struct iovec fast_iov, *iov = &fast_iov; const struct iovec __user *uvec; + struct io_buf_table table; int i, ret; BUILD_BUG_ON(IORING_MAX_REG_BUFFERS >= (1u << 16)); @@ -813,13 +877,14 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, return -EBUSY; if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS) return -EINVAL; - ret = io_rsrc_data_alloc(&data, nr_args); + ret = io_rsrc_buffer_alloc(&table, nr_args); if (ret) return ret; if (!arg) memset(iov, 0, sizeof(*iov)); + ctx->buf_table = table; for (i = 0; i < nr_args; i++) { struct io_rsrc_node *node; u64 tag = 0; @@ -859,10 +924,8 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, } node->tag = tag; } - data.nodes[i] = node; + table.data.nodes[i] = node; } - - ctx->buf_table.data = data; if (ret) io_sqe_buffers_unregister(ctx); return ret; @@ -887,14 +950,15 @@ int io_buffer_register_bvec(struct io_ring_ctx *ctx, struct request *rq, goto unlock; } - node = io_rsrc_node_alloc(IORING_RSRC_BUFFER); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER); if (!node) { ret = -ENOMEM; goto unlock; } nr_bvecs = blk_rq_nr_phys_segments(rq); - imu = kvmalloc(struct_size(imu, bvec, nr_bvecs), GFP_KERNEL); + + imu = io_alloc_imu(ctx, nr_bvecs); if (!imu) { kfree(node); ret = -ENOMEM; @@ -1031,7 +1095,7 @@ static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2) static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx, struct io_uring_clone_buffers *arg) { - struct io_rsrc_data data; + struct io_buf_table table; int i, ret, off, nr; unsigned int nbufs; @@ -1062,7 +1126,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (check_add_overflow(arg->nr, arg->dst_off, &nbufs)) return -EOVERFLOW; - ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.data.nr)); + ret = io_rsrc_buffer_alloc(&table, max(nbufs, ctx->buf_table.data.nr)); if (ret) return ret; @@ -1071,7 +1135,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx struct io_rsrc_node *src_node = ctx->buf_table.data.nodes[i]; if (src_node) { - data.nodes[i] = src_node; + table.data.nodes[i] = src_node; src_node->refs++; } } @@ -1101,7 +1165,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (!src_node) { dst_node = NULL; } else { - dst_node = io_rsrc_node_alloc(IORING_RSRC_BUFFER); + dst_node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER); if (!dst_node) { ret = -ENOMEM; goto out_free; @@ -1110,12 +1174,12 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx refcount_inc(&src_node->buf->refs); dst_node->buf = src_node->buf; } - data.nodes[off++] = dst_node; + table.data.nodes[off++] = dst_node; i++; } /* - * If asked for replace, put the old table. data->nodes[] holds both + * If asked for replace, put the old table. table.data->nodes[] holds both * old and new nodes at this point. */ if (arg->flags & IORING_REGISTER_DST_REPLACE) @@ -1128,10 +1192,10 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx * entry). */ WARN_ON_ONCE(ctx->buf_table.data.nr); - ctx->buf_table.data = data; + ctx->buf_table = table; return 0; out_free: - io_rsrc_data_free(ctx, &data); + io_rsrc_buffer_free(ctx, &table); return ret; } diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 81c31b93d4f7b..69cc212ad0c7c 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -49,7 +49,7 @@ struct io_imu_folio_data { unsigned int nr_folios; }; -struct io_rsrc_node *io_rsrc_node_alloc(int type); +struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type); void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node); void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data); int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr);