From patchwork Mon Feb 24 21:31:06 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988997 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65B301FC7CB for ; Mon, 24 Feb 2025 21:31:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432702; cv=none; b=FF0F0PozFiq5R0WLNBuvamrNz2otf526IHl5iv14ixBMszGVcjBbBz74vUUi1P+Hq8NVG134YE547gHXBon3iUkwtaPKiBBTH10CfR3PiXKAbAZMYArTiIye0C8Hg6yiLwRGYZgZ4GI5dJBB99ud28d11vU2MSN5DPNOrvG+Oi4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432702; c=relaxed/simple; bh=ov23JLGISNFtGAJ3q/hjds5wAHClRaKSX9bv0Ek/R6E=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=OTD7hcBGYnj6cJcI4mYWHtvyWGq8fFH6Z0ehVtw+mWeh8U9Qh31cHLPX1KCDeiMkVB6voTIHYCO7NxFpR4MFhG4SaBAuUzJgkgobO6N0Kx7dkq+BcfFmx8Q88WnzB+/0XBsoh88LLU5VhobOeIEj2+ia3SemSj1jx3Qz+dR9FAk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=ULbSLL0E; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="ULbSLL0E" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OFDF2F023467 for ; Mon, 24 Feb 2025 13:31:40 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=HVuEJr6UbILHmR6HfKP+QsgqAVu4/LafIuqdsitVN9E=; b=ULbSLL0EqLVN +34xp4XPMre42lKcvP7utGBa9oYip3y/c8XF9K5Z0dU2DLaoGxeOaG0ljXCedLTA fckAFQ9s+av35SPnMeSSiqAvWBJCgQJ5vR4EbELO5fpfeQhe7l4ut0gUt/mpFbwu tYLQ47UBKCS5V6yJEdEb4a/e8GmX9fkXPnZei/d1n8NUlLl4fM7ccidMb8lIO45a B3JIFc9WJ2DPZIbIGXLPR+0Kjrjfnvu1iZmBDgJMfm9QfTBGKOkNKz/FjxBKWODY K+8/4pF9teJCAhr3J1S++T769duIqFy8HUoyd6tkl5oZoL1C6gIDQxDlL8Mrwpoy 2bxVpu/eig== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 450tdcufs5-16 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:40 -0800 (PST) Received: from twshared46479.39.frc1.facebook.com (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:24 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 528821868C4E7; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv5 01/11] io_uring/rsrc: remove redundant check for valid imu Date: Mon, 24 Feb 2025 13:31:06 -0800 Message-ID: <20250224213116.3509093-2-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 4bfh0XQFVrZorRxrlEieXUPThCK_yGoG X-Proofpoint-ORIG-GUID: 4bfh0XQFVrZorRxrlEieXUPThCK_yGoG X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Keith Busch The only caller to io_buffer_unmap already checks if the node's buf is not null, so no need to check again. Signed-off-by: Keith Busch Reviewed-by: Ming Lei Reviewed-by: Pavel Begunkov --- io_uring/rsrc.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 20b884c84e55f..efef29352dcfb 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -103,19 +103,16 @@ int io_buffer_validate(struct iovec *iov) static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node) { + struct io_mapped_ubuf *imu = node->buf; unsigned int i; - if (node->buf) { - struct io_mapped_ubuf *imu = node->buf; - - if (!refcount_dec_and_test(&imu->refs)) - return; - for (i = 0; i < imu->nr_bvecs; i++) - unpin_user_page(imu->bvec[i].bv_page); - if (imu->acct_pages) - io_unaccount_mem(ctx, imu->acct_pages); - kvfree(imu); - } + if (!refcount_dec_and_test(&imu->refs)) + return; + for (i = 0; i < imu->nr_bvecs; i++) + unpin_user_page(imu->bvec[i].bv_page); + if (imu->acct_pages) + io_unaccount_mem(ctx, imu->acct_pages); + kvfree(imu); } struct io_rsrc_node *io_rsrc_node_alloc(int type) From patchwork Mon Feb 24 21:31:07 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988988 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C29F1632C8 for ; Mon, 24 Feb 2025 21:31:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432686; cv=none; b=plvT/IYMeWCJXk3FpWzIgnb0W31KZdSKfhthk1hG7Gq2nEXZpbwFbMaEEqhcHRKRbKD8EjNg29l2ddia0XLe8jBvy3G0+qg6ksSuJXXI4EltbUqMSjx8+xDO/IWBW/wuGUPdEoGH2Y8j1K8KieKGKOOBKyjK7P+vu/4I3To+jtY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432686; c=relaxed/simple; bh=Ge431Po1ujRaRqKpPyxwV/no1ZsNP+WgRZlxPZWkCAc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IRWpwP8I+Wn4Ud9Cgy8DjGapf0rNzEhyScRYCaOOCPwX2ZDvLPzRwFqItGTHhRxsqgyFY8cX+iMqqC6i/yEq7tM0tyCjAtB90z01oMVe/bkUYYmE3Y9WNTrSbQhLI7ktnn/5qJ62gAqQhJt2tuDeEAkB5p05UckZEHt6P/cASBo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=mI1qOJ8u; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="mI1qOJ8u" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OIueGS016585 for ; Mon, 24 Feb 2025 13:31:24 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=qUmUHHe0j+uWHLe+m4CdCPyVThJx6/A6xtMjccdwVVg=; b=mI1qOJ8ucNXD qpn3JCGX/3GKQUjFHlhIvOKojh5R+l1OcKjfPlkPFRcQlShoJY16NhVf3V1w2xGh 0cw2z6hxBFQazhZtqw/EZ8i7orAnm3G1akV5fAlv4GxEMVf/L6si6sWUuJNfL0mU 2TIjPGllK6tKMdqtgjw5AfymhMOJKG8dQQ3Suq73ULXK9OuLlfSQMYJ7HyX/mMFH wwG5wD0IexIbo5qvkZv0BducOmlVV9VMqHDBMEUcJJEtn1+prRnsz8DogCxwyrAl QpW9cOzemlugYwAO4O57fcHGLspDRy/jLWZ//qCfFvUMUKm1ZBQKR29ZlgWpV4H8 Y+LDEgtK0g== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 450xg0s6s4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:23 -0800 (PST) Received: from twshared9216.15.frc2.facebook.com (2620:10d:c0a8:1c::11) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:15 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 5A97D1868C4EB; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv5 02/11] io_uring/nop: reuse req->buf_index Date: Mon, 24 Feb 2025 13:31:07 -0800 Message-ID: <20250224213116.3509093-3-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: ob87U8_cOlcQn5DFS8U2C1d27A3pM-7g X-Proofpoint-GUID: ob87U8_cOlcQn5DFS8U2C1d27A3pM-7g X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Keith Busch There is already a field in io_kiocb that can store a registered buffer index, use that instead of stashing the value into struct io_nop. Signed-off-by: Keith Busch Reviewed-by: Ming Lei Reviewed-by: Pavel Begunkov --- io_uring/nop.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/io_uring/nop.c b/io_uring/nop.c index 5e5196df650a1..ea539531cb5f6 100644 --- a/io_uring/nop.c +++ b/io_uring/nop.c @@ -16,7 +16,6 @@ struct io_nop { struct file *file; int result; int fd; - int buffer; unsigned int flags; }; @@ -40,9 +39,7 @@ int io_nop_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) else nop->fd = -1; if (nop->flags & IORING_NOP_FIXED_BUFFER) - nop->buffer = READ_ONCE(sqe->buf_index); - else - nop->buffer = -1; + req->buf_index = READ_ONCE(sqe->buf_index); return 0; } @@ -69,7 +66,7 @@ int io_nop(struct io_kiocb *req, unsigned int issue_flags) ret = -EFAULT; io_ring_submit_lock(ctx, issue_flags); - node = io_rsrc_node_lookup(&ctx->buf_table, nop->buffer); + node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); if (node) { io_req_assign_buf_node(req, node); ret = 0; From patchwork Mon Feb 24 21:31:08 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988999 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E26651DF969 for ; Mon, 24 Feb 2025 21:31:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432705; cv=none; b=FQo+aTDlwUeT82mNQh0Afb6CqfKWGUhrwmfM8CKlaff6L2ATs3CkclPE5GDT0H/pRLju3oFcRpKxPtQcVo3mvf7d+3lurlI8u0VCJnNT9wACTb+xpq5wE44gC0i8VQRb5gjurXyoYMfc3AoAvmfvMLuTHzP3MiURAAz+QZ68sn4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432705; c=relaxed/simple; bh=OhH73J9J+vByXQd0nSIiVg8cPPHIHgcXH0YZ6SZvhoI=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DCq+HiN7wALluNkrEkXnME/1jbrXmOHZ6JikTC+Gc5+59ER0wefTKWcfgE+x2Xhc20urdgmbMBqbjYT0AEwXnf/7/CQX9Gn9bdXpsTfbgtdaPI1P6k2LFi5EjmlQSxZKoqer3/pvNk2EcPb4lGwN49oWwJKcBbI1JDS5ZU9OWC4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=cv774OZn; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="cv774OZn" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.18.1.2/8.18.1.2) with ESMTP id 51OKXrHf017915 for ; Mon, 24 Feb 2025 13:31:41 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=z3zE/JpygxHalJZkRBF5AYpcbJa/t5fQiwdoTI91yjM=; b=cv774OZnFZJJ xaT3mzejKLW50qQ1lsvvYwAnJRN0XwtwgzevbrVdil0QRtk7O03rimJqAgVWNl/p Iln3qoVBQGHBJyPNO48EHJJvZ3VwfT6JjfrBp3Fr+p485fIGSx4giXcrvUWRlZEu o85Z1w5rNzyY6yX8PxiuxpUqLTVesNgJ6wiV5MQexfWYlAfFSmPMm2EWdypNVtuB xaOa3XmnLve0Re72ZPq51/OB0XMgYqSWjdiYwI/XsJp89rL5BKW046lg9t6d2T1o MD924ZW6h7+jFDsNpi61X9SUDv0pwf++QS1rg+3n5l9AQi/TBtoUcXinG092IlJ2 xuu9DPcPcA== Received: from maileast.thefacebook.com ([163.114.135.16]) by m0089730.ppops.net (PPS) with ESMTPS id 450ywj8dbc-18 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:41 -0800 (PST) Received: from twshared11082.06.ash8.facebook.com (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:24 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 64C8E1868C4ED; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv5 03/11] io_uring/net: reuse req->buf_index for sendzc Date: Mon, 24 Feb 2025 13:31:08 -0800 Message-ID: <20250224213116.3509093-4-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: ATB8Y4T0vwA4-rBBKhow5gtD-DoBHPiy X-Proofpoint-ORIG-GUID: ATB8Y4T0vwA4-rBBKhow5gtD-DoBHPiy X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Pavel Begunkov There is already a field in io_kiocb that can store a registered buffer index, use that instead of stashing the value into struct io_sr_msg. Reviewed-by: Keith Busch Signed-off-by: Pavel Begunkov Reviewed-by: Ming Lei Reviewed-by: Pavel Begunkov --- io_uring/net.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 173546415ed17..fa35a6b58d472 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -76,7 +76,6 @@ struct io_sr_msg { u16 flags; /* initialised and used only by !msg send variants */ u16 buf_group; - u16 buf_index; bool retry; void __user *msg_control; /* used only for send zerocopy */ @@ -1371,7 +1370,7 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) zc->len = READ_ONCE(sqe->len); zc->msg_flags = READ_ONCE(sqe->msg_flags) | MSG_NOSIGNAL | MSG_ZEROCOPY; - zc->buf_index = READ_ONCE(sqe->buf_index); + req->buf_index = READ_ONCE(sqe->buf_index); if (zc->msg_flags & MSG_DONTWAIT) req->flags |= REQ_F_NOWAIT; @@ -1447,7 +1446,7 @@ static int io_send_zc_import(struct io_kiocb *req, unsigned int issue_flags) ret = -EFAULT; io_ring_submit_lock(ctx, issue_flags); - node = io_rsrc_node_lookup(&ctx->buf_table, sr->buf_index); + node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); if (node) { io_req_assign_buf_node(sr->notif, node); ret = 0; From patchwork Mon Feb 24 21:31:09 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988991 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D3B41DE3D1 for ; Mon, 24 Feb 2025 21:31:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432692; cv=none; b=jajjQhcsF36uzZKej9M23JYsiwzT3diU85sBxaQhuzT3wYCTFMSs3hX2ZK7ulYRb0Evj/CnJFqZ+mvoV6hdDuw1nOLTGjuJgnyUt0cRpNvRRMKH/812AZhQHjc3ffsepIE6z4tNx4YNNAOboqFZyFyqX0eW4kqw2efb7wkSxpe8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432692; c=relaxed/simple; bh=LW1kgTOAVjw7bfyBhK06dkc9l4d8chusojr5bZpVtv0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=MNC4LaeBvIz6iz7PlkNc/BFTaQ/iM3WwHLS6tYiaCPTq1pjt6oTbnEGwZ4AepWLHj7Jw8Sh/sptfMZcDegccNmsmtLgvFMtpuAzHzPTQYBsjCf41Fj9gTzIEzJonwZW1f/1toAY4zu+yHDUgdRMPVS1agnTG2HAk1o5i8bLQgO8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Gs5by91E; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Gs5by91E" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OFDHRf023683 for ; Mon, 24 Feb 2025 13:31:30 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=NMzl8MMdACNuLkhkecw/0spruZ4BE8hsOxbXOlcyQnI=; b=Gs5by91EjFyD 63Hk5sALlN6ESCEtm1RmOScK9sc9DJmWadBD9nYhGNCV3DReKqRoxaDpaD5pT+ck 5RnoEFu207Wqt6IKcheEUPybt0HLjn44xYcqjjUd0xCoVjPJDXQ4MEsS661+vTu2 gJW9m8qhU7HmgaEg+uoOSZCDWCN0ulUz4NmjOfm3DgZEw3G4IMxB/lWTpSqtQNVD UXVmLrkOlgmWCN9v58e99Mmh/2xNrxZHs6PpX3hh6L1+epACNYHuib7DxOFewVoR TnnD6nszCRaS8dewNZo/ctmnTtqCJR+PaRVCKbtNQ/V9ZxSV6y1UBeRND0z3P0gv uPVl3jyRuQ== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 450tdcufr4-12 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:29 -0800 (PST) Received: from twshared29376.33.frc3.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:16 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 68D4F1868C4EE; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv5 04/11] io_uring/nvme: pass issue_flags to io_uring_cmd_import_fixed() Date: Mon, 24 Feb 2025 13:31:09 -0800 Message-ID: <20250224213116.3509093-5-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: CJPAVKR2XB4A3FCm10J-G_XmICNXqWNV X-Proofpoint-ORIG-GUID: CJPAVKR2XB4A3FCm10J-G_XmICNXqWNV X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Pavel Begunkov io_uring_cmd_import_fixed() will need to know the io_uring execution state in following commits, for now just pass issue_flags into it without actually using. Reviewed-by: Keith Busch Signed-off-by: Pavel Begunkov Reviewed-by: Ming Lei --- drivers/nvme/host/ioctl.c | 10 ++++++---- include/linux/io_uring/cmd.h | 6 ++++-- io_uring/uring_cmd.c | 3 ++- 3 files changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index e8930146847af..e0876bc9aacde 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -114,7 +114,8 @@ static struct request *nvme_alloc_user_request(struct request_queue *q, static int nvme_map_user_request(struct request *req, u64 ubuffer, unsigned bufflen, void __user *meta_buffer, unsigned meta_len, - struct io_uring_cmd *ioucmd, unsigned int flags) + struct io_uring_cmd *ioucmd, unsigned int flags, + unsigned int iou_issue_flags) { struct request_queue *q = req->q; struct nvme_ns *ns = q->queuedata; @@ -142,7 +143,8 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer, if (WARN_ON_ONCE(flags & NVME_IOCTL_VEC)) return -EINVAL; ret = io_uring_cmd_import_fixed(ubuffer, bufflen, - rq_data_dir(req), &iter, ioucmd); + rq_data_dir(req), &iter, ioucmd, + iou_issue_flags); if (ret < 0) goto out; ret = blk_rq_map_user_iov(q, req, NULL, &iter, GFP_KERNEL); @@ -194,7 +196,7 @@ static int nvme_submit_user_cmd(struct request_queue *q, req->timeout = timeout; if (ubuffer && bufflen) { ret = nvme_map_user_request(req, ubuffer, bufflen, meta_buffer, - meta_len, NULL, flags); + meta_len, NULL, flags, 0); if (ret) return ret; } @@ -514,7 +516,7 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, if (d.addr && d.data_len) { ret = nvme_map_user_request(req, d.addr, d.data_len, nvme_to_user_ptr(d.metadata), - d.metadata_len, ioucmd, vec); + d.metadata_len, ioucmd, vec, issue_flags); if (ret) return ret; } diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h index abd0c8bd950ba..87150dc0a07cf 100644 --- a/include/linux/io_uring/cmd.h +++ b/include/linux/io_uring/cmd.h @@ -39,7 +39,8 @@ static inline void io_uring_cmd_private_sz_check(size_t cmd_sz) #if defined(CONFIG_IO_URING) int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, - struct iov_iter *iter, void *ioucmd); + struct iov_iter *iter, void *ioucmd, + unsigned int issue_flags); /* * Completes the request, i.e. posts an io_uring CQE and deallocates @ioucmd @@ -67,7 +68,8 @@ void io_uring_cmd_issue_blocking(struct io_uring_cmd *ioucmd); #else static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, - struct iov_iter *iter, void *ioucmd) + struct iov_iter *iter, void *ioucmd, + unsigned int issue_flags) { return -EOPNOTSUPP; } diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 14086a2664611..28ed69c40756e 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -257,7 +257,8 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags) } int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, - struct iov_iter *iter, void *ioucmd) + struct iov_iter *iter, void *ioucmd, + unsigned int issue_flags) { struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); struct io_rsrc_node *node = req->buf_node; From patchwork Mon Feb 24 21:31:10 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988995 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B65C11DDA15 for ; Mon, 24 Feb 2025 21:31:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432698; cv=none; b=jVZ2ybysQZJMSqOrUabYLUMvZjyfoQvvpSEzzrbcjF1U5R75s4jc9zIoL8C1jHQuPMxtpoK40qpyQncQa+dLtcL2MNaCzt1k65ge2tP1p6VQ56lR0Z/eRxnYm52NLJ0rD65Zmca1kVlAbxdpojZ4Aq9gkCyUNfUpoZJyWOuVMI0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432698; c=relaxed/simple; bh=aN9JMzH6z1RoQi0QWuvJ00ZmnO8a7FIO+8Ar0zkLmWk=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=jfEqrZ8jocADEZX5TcMLafL0plv2jGtJ2sf4DyxfPoq44Ndcgs9MECtJDNzZGUnLYGeG1INKKmh/3oocXRObzrUA4LSzqgl0AroEwVaTjkE98VPvER+RaJ2hKHXPlr5iwSRPR8gae+YP5Dv3D9vIoPY6PC+GAYXaXpvd6JEy/Ss= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=G+lwOdZJ; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="G+lwOdZJ" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OFDF24023467 for ; Mon, 24 Feb 2025 13:31:35 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=3JMiX14eSnJL69aRaWySip1q69b2nPk9mQ9nxLakszk=; b=G+lwOdZJmwqv iqRLI2ajAZy1aKzHrTT5yaEmMsfGFT/Z+JHo4plwSZ9nPGYnGLDoT+HWIbdQrnFO sXSQUR+dytgMP8F5kV1ADxAUMpgsGpFT9owYsvIF/mDZaFdwS8cXUnMOEWMYlVDj Ly/Gr1GxDZuVe4EYL18gqafuQFjAz1FvANJ2ZUzgBIC84g959rzHRV2V6kIJ5NeT XEV5KwwOdOJl329LLVBFajM/GBgjzCu8MHcC+uo4haLBjpj/sIQ3UWYioiZfD1Pm NBcj8eXYTDujwZu8hnueQYKgaLEELSS2RCcTql4FEmRnhYf8bv0ejJoIm6aZVpsI d0QW4j6eEA== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 450tdcufs5-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:35 -0800 (PST) Received: from twshared8234.09.ash9.facebook.com (2620:10d:c0a8:1b::8e35) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:23 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 710671868C4F0; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv5 05/11] io_uring: combine buffer lookup and import Date: Mon, 24 Feb 2025 13:31:10 -0800 Message-ID: <20250224213116.3509093-6-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: juUsWlUZI22cr7c-3ZFz8VVxWYiBqogg X-Proofpoint-ORIG-GUID: juUsWlUZI22cr7c-3ZFz8VVxWYiBqogg X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Pavel Begunkov Registered buffer are currently imported in two steps, first we lookup a rsrc node and then use it to set up the iterator. The first part is usually done at the prep stage, and import happens whenever it's needed. As we want to defer binding to a node so that it works with linked requests, combine both steps into a single helper. Reviewed-by: Keith Busch Signed-off-by: Pavel Begunkov Reviewed-by: Ming Lei --- io_uring/net.c | 22 ++++------------------ io_uring/rsrc.c | 31 ++++++++++++++++++++++++++++++- io_uring/rsrc.h | 6 +++--- io_uring/rw.c | 9 +-------- io_uring/uring_cmd.c | 25 ++++--------------------- 5 files changed, 42 insertions(+), 51 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index fa35a6b58d472..f223721418fac 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1441,24 +1441,10 @@ static int io_send_zc_import(struct io_kiocb *req, unsigned int issue_flags) int ret; if (sr->flags & IORING_RECVSEND_FIXED_BUF) { - struct io_ring_ctx *ctx = req->ctx; - struct io_rsrc_node *node; - - ret = -EFAULT; - io_ring_submit_lock(ctx, issue_flags); - node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); - if (node) { - io_req_assign_buf_node(sr->notif, node); - ret = 0; - } - io_ring_submit_unlock(ctx, issue_flags); - - if (unlikely(ret)) - return ret; - - ret = io_import_fixed(ITER_SOURCE, &kmsg->msg.msg_iter, - node->buf, (u64)(uintptr_t)sr->buf, - sr->len); + sr->notif->buf_index = req->buf_index; + ret = io_import_reg_buf(sr->notif, &kmsg->msg.msg_iter, + (u64)(uintptr_t)sr->buf, sr->len, + ITER_SOURCE, issue_flags); if (unlikely(ret)) return ret; kmsg->msg.sg_from_iter = io_sg_from_iter; diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index efef29352dcfb..f814526982c36 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -857,7 +857,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, return ret; } -int io_import_fixed(int ddir, struct iov_iter *iter, +static int io_import_fixed(int ddir, struct iov_iter *iter, struct io_mapped_ubuf *imu, u64 buf_addr, size_t len) { @@ -916,6 +916,35 @@ int io_import_fixed(int ddir, struct iov_iter *iter, return 0; } +static inline struct io_rsrc_node *io_find_buf_node(struct io_kiocb *req, + unsigned issue_flags) +{ + struct io_ring_ctx *ctx = req->ctx; + struct io_rsrc_node *node; + + if (req->flags & REQ_F_BUF_NODE) + return req->buf_node; + + io_ring_submit_lock(ctx, issue_flags); + node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); + if (node) + io_req_assign_buf_node(req, node); + io_ring_submit_unlock(ctx, issue_flags); + return node; +} + +int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter, + u64 buf_addr, size_t len, int ddir, + unsigned issue_flags) +{ + struct io_rsrc_node *node; + + node = io_find_buf_node(req, issue_flags); + if (!node) + return -EFAULT; + return io_import_fixed(ddir, iter, node->buf, buf_addr, len); +} + /* Lock two rings at once. The rings must be different! */ static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2) { diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 2b1e258954092..f0e9080599646 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -44,9 +44,9 @@ void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node); void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data); int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr); -int io_import_fixed(int ddir, struct iov_iter *iter, - struct io_mapped_ubuf *imu, - u64 buf_addr, size_t len); +int io_import_reg_buf(struct io_kiocb *req, struct iov_iter *iter, + u64 buf_addr, size_t len, int ddir, + unsigned issue_flags); int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg); int io_sqe_buffers_unregister(struct io_ring_ctx *ctx); diff --git a/io_uring/rw.c b/io_uring/rw.c index 3443f418d9120..db24bcd4c6335 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -352,8 +352,6 @@ static int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe int ddir) { struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); - struct io_ring_ctx *ctx = req->ctx; - struct io_rsrc_node *node; struct io_async_rw *io; int ret; @@ -361,13 +359,8 @@ static int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe if (unlikely(ret)) return ret; - node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); - if (!node) - return -EFAULT; - io_req_assign_buf_node(req, node); - io = req->async_data; - ret = io_import_fixed(ddir, &io->iter, node->buf, rw->addr, rw->len); + ret = io_import_reg_buf(req, &io->iter, rw->addr, rw->len, ddir, 0); iov_iter_save_state(&io->iter, &io->iter_state); return ret; } diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 28ed69c40756e..31d5e0948af14 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -199,21 +199,9 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) if (ioucmd->flags & ~IORING_URING_CMD_MASK) return -EINVAL; - if (ioucmd->flags & IORING_URING_CMD_FIXED) { - struct io_ring_ctx *ctx = req->ctx; - struct io_rsrc_node *node; - u16 index = READ_ONCE(sqe->buf_index); - - node = io_rsrc_node_lookup(&ctx->buf_table, index); - if (unlikely(!node)) - return -EFAULT; - /* - * Pi node upfront, prior to io_uring_cmd_import_fixed() - * being called. This prevents destruction of the mapped buffer - * we'll need at actual import time. - */ - io_req_assign_buf_node(req, node); - } + if (ioucmd->flags & IORING_URING_CMD_FIXED) + req->buf_index = READ_ONCE(sqe->buf_index); + ioucmd->cmd_op = READ_ONCE(sqe->cmd_op); return io_uring_cmd_prep_setup(req, sqe); @@ -261,13 +249,8 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, unsigned int issue_flags) { struct io_kiocb *req = cmd_to_io_kiocb(ioucmd); - struct io_rsrc_node *node = req->buf_node; - - /* Must have had rsrc_node assigned at prep time */ - if (node) - return io_import_fixed(rw, iter, node->buf, ubuf, len); - return -EFAULT; + return io_import_reg_buf(req, iter, ubuf, len, rw, issue_flags); } EXPORT_SYMBOL_GPL(io_uring_cmd_import_fixed); From patchwork Mon Feb 24 21:31:11 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988989 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD36E1D2F42 for ; Mon, 24 Feb 2025 21:31:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432689; cv=none; b=MD/dyokaDAIEVDvuL4mdQbURuQJBBTDhtu4LDgmVYdsfv1I8AZ/k8Vm3iygkYWjzWp9T8a3u+/vgvFhmDBqfrOpQd1P9dmAK84tpCkw/N/4DDRglMePzUaVBNmMScV+NvEYLCDnW/I0ECgD+bYDQ9n3ZwdIBqPcyNi6jBToatus= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432689; c=relaxed/simple; bh=Zf7av2i8gZ2F3ccYuK2w9dvC9qLf0USt/oGZoxDNh6g=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=oBZEtHt1TEpK3S0fTsg8JGZf2CszO4+asND35oGLyVsDWjAA9zhLUh4mQM7shMlq4Il+FvErg9z09IOO12V4kI6g9fGAngl4fAa0LXvSJ88WkdT8tYplA/O7C31jMRObi49SxrJVxf7r1AbziGOJMcpCh3QLjSdm26J0S8/zReo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Nj97IKuO; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Nj97IKuO" Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OIjoZS015526 for ; Mon, 24 Feb 2025 13:31:27 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=9YQLxrsAHFYEHHYXruXDo+6XMIzEP3PCd3c1VhEwHgs=; b=Nj97IKuOMRl7 op71LZuA1XD1I0vJQjyHrM0vto+Cz2FykLe5y9lhlvbxWTUsSlLgUQYlbDr/awLd ckL3mwPeA9Pm8mfLxEjg0tbRxDedzJdPdUET/1UUsLxnj53SMvii7lyTAAQx3MZa tCHOXhCXhEA18RSwg0oJLDHBTMX6qgE8qyhNnwnhYHJs+NKyeyCV3/LtgOvbcGeA TLWbF2UzlO6kqIbjhKXLKOQjqj+Po2GXFLPjpjtXyKFyyE5cmXc1Tp/zN0i16svc BXN+YgLXzQCrdOv6RfYNeLbCijLOI0u31ET+1coTQpJ26KmTdjIeqJBjIzqpemi4 dK+cgu01dw== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 450xb718x1-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:26 -0800 (PST) Received: from twshared11145.37.frc1.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:18 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 7D3A41868C4F2; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv5 06/11] io_uring/rw: move fixed buffer import to issue path Date: Mon, 24 Feb 2025 13:31:11 -0800 Message-ID: <20250224213116.3509093-7-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: yqM3-2CFT8e4ozzOOV729_GgyaKuVNXu X-Proofpoint-ORIG-GUID: yqM3-2CFT8e4ozzOOV729_GgyaKuVNXu X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Keith Busch Registered buffers may depend on a linked command, which makes the prep path too early to import. Move to the issue path when the node is actually needed like all the other users of fixed buffers. Signed-off-by: Keith Busch Reviewed-by: Ming Lei Reviewed-by: Pavel Begunkov --- io_uring/opdef.c | 8 ++++---- io_uring/rw.c | 43 ++++++++++++++++++++++++++----------------- io_uring/rw.h | 4 ++-- 3 files changed, 32 insertions(+), 23 deletions(-) diff --git a/io_uring/opdef.c b/io_uring/opdef.c index 9344534780a02..5369ae33b5ad9 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -104,8 +104,8 @@ const struct io_issue_def io_issue_defs[] = { .iopoll = 1, .iopoll_queue = 1, .async_size = sizeof(struct io_async_rw), - .prep = io_prep_read_fixed, - .issue = io_read, + .prep = io_prep_read, + .issue = io_read_fixed, }, [IORING_OP_WRITE_FIXED] = { .needs_file = 1, @@ -118,8 +118,8 @@ const struct io_issue_def io_issue_defs[] = { .iopoll = 1, .iopoll_queue = 1, .async_size = sizeof(struct io_async_rw), - .prep = io_prep_write_fixed, - .issue = io_write, + .prep = io_prep_write, + .issue = io_write_fixed, }, [IORING_OP_POLL_ADD] = { .needs_file = 1, diff --git a/io_uring/rw.c b/io_uring/rw.c index db24bcd4c6335..5f37fa48fdd9b 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -348,33 +348,20 @@ int io_prep_writev(struct io_kiocb *req, const struct io_uring_sqe *sqe) return io_prep_rwv(req, sqe, ITER_SOURCE); } -static int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe, - int ddir) +static int io_init_rw_fixed(struct io_kiocb *req, unsigned int issue_flags, int ddir) { struct io_rw *rw = io_kiocb_to_cmd(req, struct io_rw); - struct io_async_rw *io; + struct io_async_rw *io = req->async_data; int ret; - ret = io_prep_rw(req, sqe, ddir, false); - if (unlikely(ret)) - return ret; + if (io->bytes_done) + return 0; - io = req->async_data; ret = io_import_reg_buf(req, &io->iter, rw->addr, rw->len, ddir, 0); iov_iter_save_state(&io->iter, &io->iter_state); return ret; } -int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe) -{ - return io_prep_rw_fixed(req, sqe, ITER_DEST); -} - -int io_prep_write_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe) -{ - return io_prep_rw_fixed(req, sqe, ITER_SOURCE); -} - /* * Multishot read is prepared just like a normal read/write request, only * difference is that we set the MULTISHOT flag. @@ -1138,6 +1125,28 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) } } +int io_read_fixed(struct io_kiocb *req, unsigned int issue_flags) +{ + int ret; + + ret = io_init_rw_fixed(req, issue_flags, ITER_DEST); + if (ret) + return ret; + + return io_read(req, issue_flags); +} + +int io_write_fixed(struct io_kiocb *req, unsigned int issue_flags) +{ + int ret; + + ret = io_init_rw_fixed(req, issue_flags, ITER_SOURCE); + if (ret) + return ret; + + return io_write(req, issue_flags); +} + void io_rw_fail(struct io_kiocb *req) { int res; diff --git a/io_uring/rw.h b/io_uring/rw.h index a45e0c71b59d6..42a491d277273 100644 --- a/io_uring/rw.h +++ b/io_uring/rw.h @@ -30,14 +30,14 @@ struct io_async_rw { ); }; -int io_prep_read_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe); -int io_prep_write_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_prep_readv(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_prep_writev(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_prep_read(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_prep_write(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_read(struct io_kiocb *req, unsigned int issue_flags); int io_write(struct io_kiocb *req, unsigned int issue_flags); +int io_read_fixed(struct io_kiocb *req, unsigned int issue_flags); +int io_write_fixed(struct io_kiocb *req, unsigned int issue_flags); void io_readv_writev_cleanup(struct io_kiocb *req); void io_rw_fail(struct io_kiocb *req); void io_req_rw_complete(struct io_kiocb *req, io_tw_token_t tw); From patchwork Mon Feb 24 21:31:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988993 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AB711A072C for ; Mon, 24 Feb 2025 21:31:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432695; cv=none; b=MTCfkDeuV4TKK18PChl3pUaveqezR0OM3eO9mL1YUcaoO5oJD+ioSev5e1ATQv8wZpHYZAm+zDNvHb213FO2+SIMyZAjtACuYoZN1VtBgHrTab3ewukV3f202PwfKOKQ3XVr1Mlp8wTgJ3mC7t7qIfylTw2g+5K3OMqNbBMxZjk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432695; c=relaxed/simple; bh=EasOJYViq4LitIJC7mGUvBBDu5T7g9JP7Wn2wksLLMc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pAvfZnp5XBeFQQfk9oOJjDzHmG9GXJnx/H7WNUCVI93GqP44yTBI93kjYIQiExvxnBlX+Cg7KUgDTWE3pUUGaySYR0DoLI2uYpeTH664fgXpCKu3T5jNBF3d06kXF00wPzSwYsk89mPOcvAsn5blvwi/PWwIackn9SkbI05RFag= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Qkmk6mlG; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Qkmk6mlG" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OIueGl016585 for ; Mon, 24 Feb 2025 13:31:32 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=tC2w29AfKR1h7/hE9rsLeyaQRGCkZSGGH208dG8tmGU=; b=Qkmk6mlGhqs4 bg1x7ufwpP55FHHZj4mNRsANPLouV52juc84xDFgD8PvJkTeHqMgTkqzVE/SacqG d8krkqUQcHeqzZWVHmCIl7eXinLRGvdKnpk8x/8VDh3TUNSVZV0LFiyhuglvDsTE iSGINElx1fnWlYutRpTkd2pJAO8pgZKrRcvc+EzNUnUvOJ/8GSCv23oKTvUO6J1C uv501VnxOgwmldVRYcXbO5S6ADEsNCnHBjwNnU0SXdCC7QZqGBtBQgOYX6USXKBI 9Gv6IU4F7zw5LkVVv2iiecPDiI40wCipbUz6s4z7zoDJqak4a3BzC0m2uP4JylOq H+7Uco8k1g== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 450xg0s6s4-20 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:32 -0800 (PST) Received: from twshared7122.08.ash9.facebook.com (2620:10d:c0a8:fe::f072) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:18 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 8760F1868C4F4; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv5 07/11] io_uring: add support for kernel registered bvecs Date: Mon, 24 Feb 2025 13:31:12 -0800 Message-ID: <20250224213116.3509093-8-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: xwlvA8Ii2gmp_MPJlabvoXjDI2y0gNyo X-Proofpoint-GUID: xwlvA8Ii2gmp_MPJlabvoXjDI2y0gNyo X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Keith Busch Provide an interface for the kernel to leverage the existing pre-registered buffers that io_uring provides. User space can reference these later to achieve zero-copy IO. User space must register an empty fixed buffer table with io_uring in order for the kernel to make use of it. Signed-off-by: Keith Busch --- include/linux/io_uring/cmd.h | 7 ++ io_uring/rsrc.c | 123 +++++++++++++++++++++++++++++++++-- io_uring/rsrc.h | 8 +++ 3 files changed, 131 insertions(+), 7 deletions(-) diff --git a/include/linux/io_uring/cmd.h b/include/linux/io_uring/cmd.h index 87150dc0a07cf..cf8d80d847344 100644 --- a/include/linux/io_uring/cmd.h +++ b/include/linux/io_uring/cmd.h @@ -4,6 +4,7 @@ #include #include +#include /* only top 8 bits of sqe->uring_cmd_flags for kernel internal use */ #define IORING_URING_CMD_CANCELABLE (1U << 30) @@ -125,4 +126,10 @@ static inline struct io_uring_cmd_data *io_uring_cmd_get_async_data(struct io_ur return cmd_to_io_kiocb(cmd)->async_data; } +int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq, + void (*release)(void *), unsigned int index, + unsigned int issue_flags); +void io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index, + unsigned int issue_flags); + #endif /* _LINUX_IO_URING_CMD_H */ diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index f814526982c36..e0c6ed3aef5b5 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -9,6 +9,7 @@ #include #include #include +#include #include @@ -104,14 +105,21 @@ int io_buffer_validate(struct iovec *iov) static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node) { struct io_mapped_ubuf *imu = node->buf; - unsigned int i; if (!refcount_dec_and_test(&imu->refs)) return; - for (i = 0; i < imu->nr_bvecs; i++) - unpin_user_page(imu->bvec[i].bv_page); - if (imu->acct_pages) - io_unaccount_mem(ctx, imu->acct_pages); + + if (imu->release) { + imu->release(imu->priv); + } else { + unsigned int i; + + for (i = 0; i < imu->nr_bvecs; i++) + unpin_user_page(imu->bvec[i].bv_page); + if (imu->acct_pages) + io_unaccount_mem(ctx, imu->acct_pages); + } + kvfree(imu); } @@ -761,6 +769,9 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, imu->len = iov->iov_len; imu->nr_bvecs = nr_pages; imu->folio_shift = PAGE_SHIFT; + imu->release = NULL; + imu->priv = NULL; + imu->perm = IO_IMU_READABLE | IO_IMU_WRITEABLE; if (coalesced) imu->folio_shift = data.folio_shift; refcount_set(&imu->refs, 1); @@ -857,6 +868,95 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, return ret; } +int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq, + void (*release)(void *), unsigned int index, + unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx; + struct io_rsrc_data *data = &ctx->buf_table; + struct req_iterator rq_iter; + struct io_mapped_ubuf *imu; + struct io_rsrc_node *node; + struct bio_vec bv, *bvec; + u16 nr_bvecs; + int ret = 0; + + + io_ring_submit_lock(ctx, issue_flags); + if (index >= data->nr) { + ret = -EINVAL; + goto unlock; + } + index = array_index_nospec(index, data->nr); + + if (data->nodes[index] ) { + ret = -EBUSY; + goto unlock; + } + + node = io_rsrc_node_alloc(IORING_RSRC_BUFFER); + if (!node) { + ret = -ENOMEM; + goto unlock; + } + + nr_bvecs = blk_rq_nr_phys_segments(rq); + imu = kvmalloc(struct_size(imu, bvec, nr_bvecs), GFP_KERNEL); + if (!imu) { + kfree(node); + ret = -ENOMEM; + goto unlock; + } + + imu->ubuf = 0; + imu->len = blk_rq_bytes(rq); + imu->acct_pages = 0; + imu->folio_shift = PAGE_SHIFT; + imu->nr_bvecs = nr_bvecs; + refcount_set(&imu->refs, 1); + imu->release = release; + imu->priv = rq; + + if (op_is_write(req_op(rq))) + imu->perm = IO_IMU_WRITEABLE; + else + imu->perm = IO_IMU_READABLE; + + bvec = imu->bvec; + rq_for_each_bvec(bv, rq, rq_iter) + *bvec++ = bv; + + node->buf = imu; + data->nodes[index] = node; +unlock: + io_ring_submit_unlock(ctx, issue_flags); + return ret; +} +EXPORT_SYMBOL_GPL(io_buffer_register_bvec); + +void io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index, + unsigned int issue_flags) +{ + struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx; + struct io_rsrc_data *data = &ctx->buf_table; + struct io_rsrc_node *node; + + io_ring_submit_lock(ctx, issue_flags); + if (index >= data->nr) + goto unlock; + index = array_index_nospec(index, data->nr); + + node = data->nodes[index]; + if (!node || !node->buf->release) + goto unlock; + + io_put_rsrc_node(ctx, node); + data->nodes[index] = NULL; +unlock: + io_ring_submit_unlock(ctx, issue_flags); +} +EXPORT_SYMBOL_GPL(io_buffer_unregister_bvec); + static int io_import_fixed(int ddir, struct iov_iter *iter, struct io_mapped_ubuf *imu, u64 buf_addr, size_t len) @@ -871,6 +971,8 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, /* not inside the mapped region */ if (unlikely(buf_addr < imu->ubuf || buf_end > (imu->ubuf + imu->len))) return -EFAULT; + if (!(imu->perm & (1 << ddir))) + return -EFAULT; /* * Might not be a start of buffer, set size appropriately @@ -883,8 +985,8 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, /* * Don't use iov_iter_advance() here, as it's really slow for * using the latter parts of a big fixed buffer - it iterates - * over each segment manually. We can cheat a bit here, because - * we know that: + * over each segment manually. We can cheat a bit here for user + * registered nodes, because we know that: * * 1) it's a BVEC iter, we set it up * 2) all bvecs are the same in size, except potentially the @@ -898,8 +1000,15 @@ static int io_import_fixed(int ddir, struct iov_iter *iter, */ const struct bio_vec *bvec = imu->bvec; + /* + * Kernel buffer bvecs, on the other hand, don't necessarily + * have the size property of user registered ones, so we have + * to use the slow iter advance. + */ if (offset < bvec->bv_len) { iter->iov_offset = offset; + } else if (imu->release) { + iov_iter_advance(iter, offset); } else { unsigned long seg_skip; diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index f0e9080599646..64bf35667cf9c 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -20,6 +20,11 @@ struct io_rsrc_node { }; }; +enum { + IO_IMU_READABLE = 1 << 0, + IO_IMU_WRITEABLE = 1 << 1, +}; + struct io_mapped_ubuf { u64 ubuf; unsigned int len; @@ -27,6 +32,9 @@ struct io_mapped_ubuf { unsigned int folio_shift; refcount_t refs; unsigned long acct_pages; + void (*release)(void *); + void *priv; + u8 perm; struct bio_vec bvec[] __counted_by(nr_bvecs); }; From patchwork Mon Feb 24 21:31:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988998 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 408821FCCE8 for ; Mon, 24 Feb 2025 21:31:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432703; cv=none; b=mqtsMaPQhh9zay/mDLdqlF4GSQFLuRMCLCFxUxys0ulYfSQ0exEtPmKjtNX9pIAAZJmXL/kJJdgiuJ7I3y6f+A5Zt9p4KbIC5gEtQUaM/nwcskbn82cQ82H6Y8D5+j5CXMXri3omeRL7GhSoDWI26/iWMgDLPaD9dOn3piieATg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432703; c=relaxed/simple; bh=dyKlvxGB9ur3ivjBicd02vF8E+IwAJ77t9EMmbKZOx0=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cebJpvN46fjdGvtPhiaGL8B0p5eL7sEUqqDCdth1Bc3dexQwVMJpyavBJAGnN9Yg2m2IPJAa+DOY3MRkWwyJIgrfMo2j8NM3Nbkd6v3pfMayVJM1KSJGsuoMYqouG+x9+MeyR6wuFIJsct2Djjk5+hwoU4yZYX5jeEBk3RaCZCE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=SOaDFVEG; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="SOaDFVEG" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OFDF2H023467 for ; Mon, 24 Feb 2025 13:31:41 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=fjVHWXOFLhBd/242WbGuxfrakKcH9NPr2Y8uDvvW03U=; b=SOaDFVEG88s/ /i3TBiiy8cHMS9LPi8/vEHijoEgXX1Y49bsGM+JYaiRYbx92CBdW72VIwr/eouMJ ad8BEmWgtjpF0L70jJnvthDYMEziE1LNZZQB6Pm9/et5Dq49F54Upfpy8eIaDWLP aElYXo3nu+se51NY90SI+FYg6ztKpUySHTcs9ARFOfTnbtQb6hP432jWFSH0j4iV tDbnSbnpwBdRqztWup4Sh+4RpPGVj5Ba3vsyODj0O2P8oqIkX66FL51yrqi+0pR3 2Qu2p3MixZqlUhFUhABmZ53ZFjoeomXEpICysYjUE4CnIiO02qYcZm7Icv/xzZdx VFwyY325Nw== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 450tdcufs5-18 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:41 -0800 (PST) Received: from twshared46479.39.frc1.facebook.com (2620:10d:c0a8:fe::f072) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:24 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 938C81868C4F6; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Xinyu Zhang , Keith Busch Subject: [PATCHv5 08/11] nvme: map uring_cmd data even if address is 0 Date: Mon, 24 Feb 2025 13:31:13 -0800 Message-ID: <20250224213116.3509093-9-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 7UWZ9Zft9qL0kgVcNO_jv7Aybw9JFEJU X-Proofpoint-ORIG-GUID: 7UWZ9Zft9qL0kgVcNO_jv7Aybw9JFEJU X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Xinyu Zhang When using kernel registered bvec fixed buffers, the "address" is actually the offset into the bvec rather than userspace address. Therefore it can be 0. We can skip checking whether the address is NULL before mapping uring_cmd data. Bad userspace address will be handled properly later when the user buffer is imported. With this patch, we will be able to use the kernel registered bvec fixed buffers in io_uring NVMe passthru with ublk zero-copy support in https://lore.kernel.org/io-uring/20250218224229.837848-1-kbusch@meta.com/T/#u. Reviewed-by: Caleb Sander Mateos Reviewed-by: Jens Axboe Reviewed-by: Keith Busch Signed-off-by: Xinyu Zhang Reviewed-by: Ming Lei --- drivers/nvme/host/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c index e0876bc9aacde..fe9fb80c6a144 100644 --- a/drivers/nvme/host/ioctl.c +++ b/drivers/nvme/host/ioctl.c @@ -513,7 +513,7 @@ static int nvme_uring_cmd_io(struct nvme_ctrl *ctrl, struct nvme_ns *ns, return PTR_ERR(req); req->timeout = d.timeout_ms ? msecs_to_jiffies(d.timeout_ms) : 0; - if (d.addr && d.data_len) { + if (d.data_len) { ret = nvme_map_user_request(req, d.addr, d.data_len, nvme_to_user_ptr(d.metadata), d.metadata_len, ioucmd, vec, issue_flags); From patchwork Mon Feb 24 21:31:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988992 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7F10E1A072C for ; Mon, 24 Feb 2025 21:31:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432693; cv=none; b=GpB9BlkEyOKSf5opq1oFql8mzk9BCJj3kRQj3txqlzUagRYs06IQaV2DpgL2XwhWbJb2BSBieDB29GHN2ZljOesoznTfiahf6j4W04cjl4Fd+2nHEdpSBV3PBy2dYb1lEu9kBreJC6+ix+OipqED2m5TbIix5hZBCVsIMgOZRcw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432693; c=relaxed/simple; bh=msnv5ixlJXHgTVk2ZhdjW4xwo3TrRPzTaG5653PlwzY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=spKazIuF+Ij9FwxLvF2bBw8xHdiNdCRkGNKBWz4HzdB79aN7gfERsX90L7tpy0FLim1JSCM97fRIvh8OV5bakhftgTJODBc9ATkjMLvxw4F406Vj5DhNPJYb1fADlninZUkGiIekM91jtsRSNgIKqYQdZzZx//7Jzyo2JY99UrE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=dYW5VXvF; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="dYW5VXvF" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OIueGg016585 for ; Mon, 24 Feb 2025 13:31:30 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=94u2lzh7F9+idaGxxa+athVNXo0yzGVKXssnIpJIMQM=; b=dYW5VXvFv7AB yzRcEyTf1fJZRxrFLxtAJwCEMY7Ix8JKo0fVgI4QIbqewT5UXObqZj53HB3y6E9Q 5g8dMziytAFdnO2F3Rh+yoE9m8YSPnQa+ztSdAhOLOcib5S3p7ecLtlbGcQngH6m xTwla11x9wXLsCnc+TMUlqCtyy6jaQkMUmjXzEN5as8H+R6KBjAgLgQnBoSMB6H5 EAH+R5nPlBwXwBeKOusbpyFaPwzOrW0EKlMWyEi061jLZR9TGyM4vpKBug2Cn0K0 dHqdar0lHdoz7UR9V8rs65ddEhpAZ0VVt4ftuYLjdKS+X6gPkI81Z67yWCTGrIQa Buq/j/NUkQ== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 450xg0s6s4-15 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:30 -0800 (PST) Received: from twshared29376.33.frc3.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:16 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 99B231868C4F8; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv5 09/11] ublk: zc register/unregister bvec Date: Mon, 24 Feb 2025 13:31:14 -0800 Message-ID: <20250224213116.3509093-10-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: 0N8vi-FiKqVUUImePRO19WhyP6ZihjIs X-Proofpoint-GUID: 0N8vi-FiKqVUUImePRO19WhyP6ZihjIs X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Keith Busch Provide new operations for the user to request mapping an active request to an io uring instance's buf_table. The user has to provide the index it wants to install the buffer. A reference count is taken on the request to ensure it can't be completed while it is active in a ring's buf_table. Signed-off-by: Keith Busch Reviewed-by: Pavel Begunkov # io_uring --- drivers/block/ublk_drv.c | 117 +++++++++++++++++++++++----------- include/uapi/linux/ublk_cmd.h | 4 ++ 2 files changed, 85 insertions(+), 36 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 529085181f355..a719d873e3882 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -51,6 +51,9 @@ /* private ioctl command mirror */ #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC) +#define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF) +#define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF) + /* All UBLK_F_* have to be included into UBLK_F_ALL */ #define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \ | UBLK_F_URING_CMD_COMP_IN_TASK \ @@ -201,7 +204,7 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq, int tag); static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub) { - return ub->dev_info.flags & UBLK_F_USER_COPY; + return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY); } static inline bool ublk_dev_is_zoned(const struct ublk_device *ub) @@ -581,7 +584,7 @@ static void ublk_apply_params(struct ublk_device *ub) static inline bool ublk_support_user_copy(const struct ublk_queue *ubq) { - return ubq->flags & UBLK_F_USER_COPY; + return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY); } static inline bool ublk_need_req_ref(const struct ublk_queue *ubq) @@ -1747,6 +1750,77 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd, io_uring_cmd_mark_cancelable(cmd, issue_flags); } +static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, + struct ublk_queue *ubq, int tag, size_t offset) +{ + struct request *req; + + if (!ublk_need_req_ref(ubq)) + return NULL; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req) + return NULL; + + if (!ublk_get_req_ref(ubq, req)) + return NULL; + + if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) + goto fail_put; + + if (!ublk_rq_has_data(req)) + goto fail_put; + + if (offset > blk_rq_bytes(req)) + goto fail_put; + + return req; +fail_put: + ublk_put_req_ref(ubq, req); + return NULL; +} + +static void ublk_io_release(void *priv) +{ + struct request *rq = priv; + struct ublk_queue *ubq = rq->mq_hctx->driver_data; + + ublk_put_req_ref(ubq, rq); +} + +static int ublk_register_io_buf(struct io_uring_cmd *cmd, + struct ublk_queue *ubq, unsigned int tag, + const struct ublksrv_io_cmd *ub_cmd, + unsigned int issue_flags) +{ + struct ublk_device *ub = cmd->file->private_data; + int index = (int)ub_cmd->addr, ret; + struct request *req; + + req = __ublk_check_and_get_req(ub, ubq, tag, 0); + if (!req) + return -EINVAL; + + ret = io_buffer_register_bvec(cmd, req, ublk_io_release, index, + issue_flags); + if (ret) { + ublk_put_req_ref(ubq, req); + return ret; + } + + return 0; +} + +static int ublk_unregister_io_buf(struct io_uring_cmd *cmd, + const struct ublksrv_io_cmd *ub_cmd, + unsigned int issue_flags) +{ + int index = (int)ub_cmd->addr; + + io_buffer_unregister_bvec(cmd, index, issue_flags); + return 0; +} + static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags, const struct ublksrv_io_cmd *ub_cmd) @@ -1798,6 +1872,10 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, ret = -EINVAL; switch (_IOC_NR(cmd_op)) { + case UBLK_IO_REGISTER_IO_BUF: + return ublk_register_io_buf(cmd, ubq, tag, ub_cmd, issue_flags); + case UBLK_IO_UNREGISTER_IO_BUF: + return ublk_unregister_io_buf(cmd, ub_cmd, issue_flags); case UBLK_IO_FETCH_REQ: /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */ if (ublk_queue_ready(ubq)) { @@ -1872,36 +1950,6 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, return -EIOCBQUEUED; } -static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, - struct ublk_queue *ubq, int tag, size_t offset) -{ - struct request *req; - - if (!ublk_need_req_ref(ubq)) - return NULL; - - req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); - if (!req) - return NULL; - - if (!ublk_get_req_ref(ubq, req)) - return NULL; - - if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) - goto fail_put; - - if (!ublk_rq_has_data(req)) - goto fail_put; - - if (offset > blk_rq_bytes(req)) - goto fail_put; - - return req; -fail_put: - ublk_put_req_ref(ubq, req); - return NULL; -} - static inline int ublk_ch_uring_cmd_local(struct io_uring_cmd *cmd, unsigned int issue_flags) { @@ -2527,9 +2575,6 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) goto out_free_dev_number; } - /* We are not ready to support zero copy */ - ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY; - ub->dev_info.nr_hw_queues = min_t(unsigned int, ub->dev_info.nr_hw_queues, nr_cpu_ids); ublk_align_max_io_size(ub); @@ -2860,7 +2905,7 @@ static int ublk_ctrl_get_features(struct io_uring_cmd *cmd) { const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe); void __user *argp = (void __user *)(unsigned long)header->addr; - u64 features = UBLK_F_ALL & ~UBLK_F_SUPPORT_ZERO_COPY; + u64 features = UBLK_F_ALL; if (header->len != UBLK_FEATURES_LEN || !header->addr) return -EINVAL; diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index a8bc98bb69fce..74246c926b55f 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -94,6 +94,10 @@ _IOWR('u', UBLK_IO_COMMIT_AND_FETCH_REQ, struct ublksrv_io_cmd) #define UBLK_U_IO_NEED_GET_DATA \ _IOWR('u', UBLK_IO_NEED_GET_DATA, struct ublksrv_io_cmd) +#define UBLK_U_IO_REGISTER_IO_BUF \ + _IOWR('u', 0x23, struct ublksrv_io_cmd) +#define UBLK_U_IO_UNREGISTER_IO_BUF \ + _IOWR('u', 0x24, struct ublksrv_io_cmd) /* only ABORT means that no re-fetch */ #define UBLK_IO_RES_OK 0 From patchwork Mon Feb 24 21:31:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988994 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98E2B1D6DD4 for ; Mon, 24 Feb 2025 21:31:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432696; cv=none; b=ReIUanx0RL3JP6WQpu8lE4O+m3gEX5+I2SRtZuA8IGNp8nhkeiazTI4zW75890VAhcFCF2NyFaLT57my8ngtWJZxlYrQrtey9ra0RAHi2gLK59U9ZQfHidHAw+z9xUWfRmKxIhswMM/O6qoEgswoZ+UfehbWoTeKZQgYXkA8Gcg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432696; c=relaxed/simple; bh=4QXO9kgSaFBs3zesSCvMupj6jtAvlDHwv0HHIWiATBs=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=SKx07kUPG1wqDh8WpbU7BiB4JiismXA7FxMEJxSMzSImyhxUTtt1jgb3aNAIzt7EoaEcAXt2Pzik3RxHADSG8FEAg0H7fJMKVUVw0GNJ4xOn3986XWu7n4drsamYRVB1v3ZPIf9Sqp2nFaGvvtWWBIREX0Vwss2pxJWYpyUfQiI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=iFS7RaV8; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="iFS7RaV8" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OFDHRm023683 for ; Mon, 24 Feb 2025 13:31:33 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=SzPTxtOY366aymp192PDyGyrXvJFVFKdQu4O1jQw+mg=; b=iFS7RaV835OF e4j+FzB8YqpqGJYRXYpRzoavMfXqr5kTAbJev/3+OCjhNsoR5pV2brT6HGWyE907 dxChBY1SciSBaH+/L2XeTReBnJlBQ/T2yPnogQ7YLeV2qa06WUB5FojQYM9XcBZt L2cBl7FhEXQ+hNxQ7gIT/5zfVVJGRza6gFOplH6qaru4e0ZQSdXdHNodLFmWM3x3 P0tlnbXbDn5lSmu1kJil4ZUkkDE53nPKh+/lg7OuXJqBWykV0VEKZZVT8Nra8m95 GRbopvd8dheNaJqEp6xE3r58mbl1RgsQf00Mb9/CEwDvhZP+ehsX1vjv/X6HTMd7 Umg2sdpGgw== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 450tdcufr4-19 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:33 -0800 (PST) Received: from twshared7122.08.ash9.facebook.com (2620:10d:c0a8:fe::f072) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:18 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id A3DD01868C4FA; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv5 10/11] io_uring: add abstraction for buf_table rsrc data Date: Mon, 24 Feb 2025 13:31:15 -0800 Message-ID: <20250224213116.3509093-11-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: IeZctVV_eORF_zS6d4cogNgmdTFSjE7G X-Proofpoint-ORIG-GUID: IeZctVV_eORF_zS6d4cogNgmdTFSjE7G X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Keith Busch We'll need to add more fields specific to the registered buffers, so make a layer for it now. No functional change in this patch. Reviewed-by: Caleb Sander Mateos Signed-off-by: Keith Busch Reviewed-by: Pavel Begunkov --- include/linux/io_uring_types.h | 6 +++- io_uring/fdinfo.c | 8 +++--- io_uring/nop.c | 2 +- io_uring/register.c | 2 +- io_uring/rsrc.c | 51 +++++++++++++++++----------------- 5 files changed, 36 insertions(+), 33 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index c0fe8a00fe53a..a05ae4cb98a4c 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -69,6 +69,10 @@ struct io_file_table { unsigned int alloc_hint; }; +struct io_buf_table { + struct io_rsrc_data data; +}; + struct io_hash_bucket { struct hlist_head list; } ____cacheline_aligned_in_smp; @@ -293,7 +297,7 @@ struct io_ring_ctx { struct io_wq_work_list iopoll_list; struct io_file_table file_table; - struct io_rsrc_data buf_table; + struct io_buf_table buf_table; struct io_submit_state submit_state; diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index f60d0a9d505e2..d389c06cbce10 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -217,12 +217,12 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *file) seq_puts(m, "\n"); } } - seq_printf(m, "UserBufs:\t%u\n", ctx->buf_table.nr); - for (i = 0; has_lock && i < ctx->buf_table.nr; i++) { + seq_printf(m, "UserBufs:\t%u\n", ctx->buf_table.data.nr); + for (i = 0; has_lock && i < ctx->buf_table.data.nr; i++) { struct io_mapped_ubuf *buf = NULL; - if (ctx->buf_table.nodes[i]) - buf = ctx->buf_table.nodes[i]->buf; + if (ctx->buf_table.data.nodes[i]) + buf = ctx->buf_table.data.nodes[i]->buf; if (buf) seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf, buf->len); else diff --git a/io_uring/nop.c b/io_uring/nop.c index ea539531cb5f6..da8870e00eee7 100644 --- a/io_uring/nop.c +++ b/io_uring/nop.c @@ -66,7 +66,7 @@ int io_nop(struct io_kiocb *req, unsigned int issue_flags) ret = -EFAULT; io_ring_submit_lock(ctx, issue_flags); - node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); + node = io_rsrc_node_lookup(&ctx->buf_table.data, req->buf_index); if (node) { io_req_assign_buf_node(req, node); ret = 0; diff --git a/io_uring/register.c b/io_uring/register.c index cc23a4c205cd4..f15a8d52ad30f 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -926,7 +926,7 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, ret = __io_uring_register(ctx, opcode, arg, nr_args); trace_io_uring_register(ctx, opcode, ctx->file_table.data.nr, - ctx->buf_table.nr, ret); + ctx->buf_table.data.nr, ret); mutex_unlock(&ctx->uring_lock); fput(file); diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index e0c6ed3aef5b5..70558317fbb2b 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -236,9 +236,9 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, __u32 done; int i, err; - if (!ctx->buf_table.nr) + if (!ctx->buf_table.data.nr) return -ENXIO; - if (up->offset + nr_args > ctx->buf_table.nr) + if (up->offset + nr_args > ctx->buf_table.data.nr) return -EINVAL; for (done = 0; done < nr_args; done++) { @@ -270,9 +270,9 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, } node->tag = tag; } - i = array_index_nospec(up->offset + done, ctx->buf_table.nr); - io_reset_rsrc_node(ctx, &ctx->buf_table, i); - ctx->buf_table.nodes[i] = node; + i = array_index_nospec(up->offset + done, ctx->buf_table.data.nr); + io_reset_rsrc_node(ctx, &ctx->buf_table.data, i); + ctx->buf_table.data.nodes[i] = node; if (ctx->compat) user_data += sizeof(struct compat_iovec); else @@ -550,9 +550,9 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, int io_sqe_buffers_unregister(struct io_ring_ctx *ctx) { - if (!ctx->buf_table.nr) + if (!ctx->buf_table.data.nr) return -ENXIO; - io_rsrc_data_free(ctx, &ctx->buf_table); + io_rsrc_data_free(ctx, &ctx->buf_table.data); return 0; } @@ -579,8 +579,8 @@ static bool headpage_already_acct(struct io_ring_ctx *ctx, struct page **pages, } /* check previously registered pages */ - for (i = 0; i < ctx->buf_table.nr; i++) { - struct io_rsrc_node *node = ctx->buf_table.nodes[i]; + for (i = 0; i < ctx->buf_table.data.nr; i++) { + struct io_rsrc_node *node = ctx->buf_table.data.nodes[i]; struct io_mapped_ubuf *imu; if (!node) @@ -809,7 +809,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, BUILD_BUG_ON(IORING_MAX_REG_BUFFERS >= (1u << 16)); - if (ctx->buf_table.nr) + if (ctx->buf_table.data.nr) return -EBUSY; if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS) return -EINVAL; @@ -862,7 +862,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, data.nodes[i] = node; } - ctx->buf_table = data; + ctx->buf_table.data = data; if (ret) io_sqe_buffers_unregister(ctx); return ret; @@ -873,7 +873,7 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq, unsigned int issue_flags) { struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx; - struct io_rsrc_data *data = &ctx->buf_table; + struct io_rsrc_data *data = &ctx->buf_table.data; struct req_iterator rq_iter; struct io_mapped_ubuf *imu; struct io_rsrc_node *node; @@ -938,7 +938,7 @@ void io_buffer_unregister_bvec(struct io_uring_cmd *cmd, unsigned int index, unsigned int issue_flags) { struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx; - struct io_rsrc_data *data = &ctx->buf_table; + struct io_rsrc_data *data = &ctx->buf_table.data; struct io_rsrc_node *node; io_ring_submit_lock(ctx, issue_flags); @@ -1035,7 +1035,7 @@ static inline struct io_rsrc_node *io_find_buf_node(struct io_kiocb *req, return req->buf_node; io_ring_submit_lock(ctx, issue_flags); - node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); + node = io_rsrc_node_lookup(&ctx->buf_table.data, req->buf_index); if (node) io_req_assign_buf_node(req, node); io_ring_submit_unlock(ctx, issue_flags); @@ -1085,10 +1085,10 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (!arg->nr && (arg->dst_off || arg->src_off)) return -EINVAL; /* not allowed unless REPLACE is set */ - if (ctx->buf_table.nr && !(arg->flags & IORING_REGISTER_DST_REPLACE)) + if (ctx->buf_table.data.nr && !(arg->flags & IORING_REGISTER_DST_REPLACE)) return -EBUSY; - nbufs = src_ctx->buf_table.nr; + nbufs = src_ctx->buf_table.data.nr; if (!arg->nr) arg->nr = nbufs; else if (arg->nr > nbufs) @@ -1098,13 +1098,13 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (check_add_overflow(arg->nr, arg->dst_off, &nbufs)) return -EOVERFLOW; - ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.nr)); + ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.data.nr)); if (ret) return ret; /* Fill entries in data from dst that won't overlap with src */ - for (i = 0; i < min(arg->dst_off, ctx->buf_table.nr); i++) { - struct io_rsrc_node *src_node = ctx->buf_table.nodes[i]; + for (i = 0; i < min(arg->dst_off, ctx->buf_table.data.nr); i++) { + struct io_rsrc_node *src_node = ctx->buf_table.data.nodes[i]; if (src_node) { data.nodes[i] = src_node; @@ -1113,7 +1113,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx } ret = -ENXIO; - nbufs = src_ctx->buf_table.nr; + nbufs = src_ctx->buf_table.data.nr; if (!nbufs) goto out_free; ret = -EINVAL; @@ -1133,7 +1133,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx while (nr--) { struct io_rsrc_node *dst_node, *src_node; - src_node = io_rsrc_node_lookup(&src_ctx->buf_table, i); + src_node = io_rsrc_node_lookup(&src_ctx->buf_table.data, i); if (!src_node) { dst_node = NULL; } else { @@ -1155,7 +1155,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx * old and new nodes at this point. */ if (arg->flags & IORING_REGISTER_DST_REPLACE) - io_rsrc_data_free(ctx, &ctx->buf_table); + io_sqe_buffers_unregister(ctx); /* * ctx->buf_table must be empty now - either the contents are being @@ -1163,10 +1163,9 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx * copied to a ring that does not have buffers yet (checked at function * entry). */ - WARN_ON_ONCE(ctx->buf_table.nr); - ctx->buf_table = data; + WARN_ON_ONCE(ctx->buf_table.data.nr); + ctx->buf_table.data = data; return 0; - out_free: io_rsrc_data_free(ctx, &data); return ret; @@ -1191,7 +1190,7 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg) return -EFAULT; if (buf.flags & ~(IORING_REGISTER_SRC_REGISTERED|IORING_REGISTER_DST_REPLACE)) return -EINVAL; - if (!(buf.flags & IORING_REGISTER_DST_REPLACE) && ctx->buf_table.nr) + if (!(buf.flags & IORING_REGISTER_DST_REPLACE) && ctx->buf_table.data.nr) return -EBUSY; if (memchr_inv(buf.pad, 0, sizeof(buf.pad))) return -EINVAL; From patchwork Mon Feb 24 21:31:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13988996 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C6131DDA15 for ; Mon, 24 Feb 2025 21:31:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432701; cv=none; b=uQ+Zt/NXLYqvqyqHWNLIcjoTtA93GMnNQOJ9W6NL9LKzXJHOofsMv2TmyLrr6ewrts4IA9iFiP+1oAoyFUacvjl2jxov2LAo80mulCdANrbf3ietNs62NmigxQE8+Rpa/ZYPRa+uGiOX24pAyXyxCWnQ6pbQqtbXw3p7erVjUfc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1740432701; c=relaxed/simple; bh=2VOvtjAmDNj90ppAMiv7Lg4/oxILgzV90wfAvOTVDf4=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=T99h+5HV1aeIszbVhaHRWx03KtH+QvPp1XD5pzF3zhYzjzMTO3JHMAxtjVLfyf34iiXlgK3Mg501f33n/dW+XSjvapKmbpIJxD9C9szVPDz8ig8cgbJYv2z7BtqUkgg6DNb2GC9di0/VJGREOF7jQXRRdHFXE09h/GRDeGqk7Ds= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=eVT4KAZT; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="eVT4KAZT" Received: from pps.filterd (m0148460.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51OIuEtO016102 for ; Mon, 24 Feb 2025 13:31:38 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=7aLrIlXwFLnklnV92yZir3EBkfUSAngWT7waSMao2wc=; b=eVT4KAZT0RH2 goW35d/+pox5pN3WVdwEfmnhXu+qxKkFpiMhiyZB5Efdi0u4+PXkOx2OHhVGw0Hw 9Kpa0C6FnFlWZqIwYVQ5fp+36lPxXT6EXLUZ+QFeM7aV12nXtP6cBnVbUmSAAgmR SvKzmm0coQo/pRYRuCzureDj3cUwSIHnGQOaNptcNwWYhMI1QgqeJi5hgx3s3ETS PqAL/TUCbj9Ci5oKsbc0yL3u6byQWxzd9rk5i5WxQpJuKvBVM0ho2vPa6AtcdHxw UKsVbx2q9R8dvZQpQoe3w9oE4J20ljUd9Saz5D5cnnwh1NbTk3jjReQhh23HL4ny JBiHtddNzQ== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 450xg0s6tj-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 24 Feb 2025 13:31:37 -0800 (PST) Received: from twshared18153.09.ash9.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Mon, 24 Feb 2025 21:31:26 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id ABFCC1868C4FC; Mon, 24 Feb 2025 13:31:17 -0800 (PST) From: Keith Busch To: , , , , CC: , , Keith Busch Subject: [PATCHv5 11/11] io_uring: cache nodes and mapped buffers Date: Mon, 24 Feb 2025 13:31:16 -0800 Message-ID: <20250224213116.3509093-12-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250224213116.3509093-1-kbusch@meta.com> References: <20250224213116.3509093-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-ORIG-GUID: NKajE7rRBoNG9jWf5O6KWbDrDzWi1kh- X-Proofpoint-GUID: NKajE7rRBoNG9jWf5O6KWbDrDzWi1kh- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-24_10,2025-02-24_02,2024-11-22_01 From: Keith Busch Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers. Signed-off-by: Keith Busch --- include/linux/io_uring_types.h | 18 ++--- io_uring/filetable.c | 2 +- io_uring/rsrc.c | 120 +++++++++++++++++++++++++-------- io_uring/rsrc.h | 2 +- 4 files changed, 104 insertions(+), 38 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index a05ae4cb98a4c..fda3221de2174 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -69,8 +69,18 @@ struct io_file_table { unsigned int alloc_hint; }; +struct io_alloc_cache { + void **entries; + unsigned int nr_cached; + unsigned int max_cached; + unsigned int elem_size; + unsigned int init_clear; +}; + struct io_buf_table { struct io_rsrc_data data; + struct io_alloc_cache node_cache; + struct io_alloc_cache imu_cache; }; struct io_hash_bucket { @@ -224,14 +234,6 @@ struct io_submit_state { struct blk_plug plug; }; -struct io_alloc_cache { - void **entries; - unsigned int nr_cached; - unsigned int max_cached; - unsigned int elem_size; - unsigned int init_clear; -}; - struct io_ring_ctx { /* const or read-mostly hot data */ struct { diff --git a/io_uring/filetable.c b/io_uring/filetable.c index dd8eeec97acf6..a21660e3145ab 100644 --- a/io_uring/filetable.c +++ b/io_uring/filetable.c @@ -68,7 +68,7 @@ static int io_install_fixed_file(struct io_ring_ctx *ctx, struct file *file, if (slot_index >= ctx->file_table.data.nr) return -EINVAL; - node = io_rsrc_node_alloc(IORING_RSRC_FILE); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); if (!node) return -ENOMEM; diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 70558317fbb2b..43ee821e3f5d0 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -33,6 +33,8 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, #define IORING_MAX_FIXED_FILES (1U << 20) #define IORING_MAX_REG_BUFFERS (1U << 14) +#define IO_CACHED_BVECS_SEGS 32 + int __io_account_mem(struct user_struct *user, unsigned long nr_pages) { unsigned long page_limit, cur_pages, new_pages; @@ -102,6 +104,22 @@ int io_buffer_validate(struct iovec *iov) return 0; } +static struct io_mapped_ubuf *io_alloc_imu(struct io_ring_ctx *ctx, + int nr_bvecs) +{ + if (nr_bvecs <= IO_CACHED_BVECS_SEGS) + return io_cache_alloc(&ctx->buf_table.imu_cache, GFP_KERNEL); + return kvmalloc(struct_size_t(struct io_mapped_ubuf, bvec, nr_bvecs), + GFP_KERNEL); +} + +static void io_free_imu(struct io_ring_ctx *ctx, struct io_mapped_ubuf *imu) +{ + if (imu->nr_bvecs > IO_CACHED_BVECS_SEGS || + !io_alloc_cache_put(&ctx->buf_table.imu_cache, imu)) + kvfree(imu); +} + static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node) { struct io_mapped_ubuf *imu = node->buf; @@ -120,22 +138,35 @@ static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node) io_unaccount_mem(ctx, imu->acct_pages); } - kvfree(imu); + io_free_imu(ctx, imu); } -struct io_rsrc_node *io_rsrc_node_alloc(int type) +struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type) { struct io_rsrc_node *node; - node = kzalloc(sizeof(*node), GFP_KERNEL); + if (type == IORING_RSRC_FILE) + node = kmalloc(sizeof(*node), GFP_KERNEL); + else + node = io_cache_alloc(&ctx->buf_table.node_cache, GFP_KERNEL); if (node) { node->type = type; node->refs = 1; + node->tag = 0; + node->file_ptr = 0; } return node; } -__cold void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data) +static __cold void __io_rsrc_data_free(struct io_rsrc_data *data) +{ + kvfree(data->nodes); + data->nodes = NULL; + data->nr = 0; +} + +__cold void io_rsrc_data_free(struct io_ring_ctx *ctx, + struct io_rsrc_data *data) { if (!data->nr) return; @@ -143,9 +174,7 @@ __cold void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data if (data->nodes[data->nr]) io_put_rsrc_node(ctx, data->nodes[data->nr]); } - kvfree(data->nodes); - data->nodes = NULL; - data->nr = 0; + __io_rsrc_data_free(data); } __cold int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr) @@ -159,6 +188,31 @@ __cold int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr) return -ENOMEM; } +static __cold int io_rsrc_buffer_alloc(struct io_buf_table *table, unsigned nr) +{ + const int imu_cache_size = struct_size_t(struct io_mapped_ubuf, bvec, + IO_CACHED_BVECS_SEGS); + const int node_size = sizeof(struct io_rsrc_node); + int ret; + + ret = io_rsrc_data_alloc(&table->data, nr); + if (ret) + return ret; + + if (io_alloc_cache_init(&table->node_cache, nr, node_size, 0)) + goto free_data; + + if (io_alloc_cache_init(&table->imu_cache, nr, imu_cache_size, 0)) + goto free_cache; + + return 0; +free_cache: + io_alloc_cache_free(&table->node_cache, kfree); +free_data: + __io_rsrc_data_free(&table->data); + return -ENOMEM; +} + static int __io_sqe_files_update(struct io_ring_ctx *ctx, struct io_uring_rsrc_update2 *up, unsigned nr_args) @@ -208,7 +262,7 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx, err = -EBADF; break; } - node = io_rsrc_node_alloc(IORING_RSRC_FILE); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); if (!node) { err = -ENOMEM; fput(file); @@ -460,6 +514,8 @@ void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node) case IORING_RSRC_BUFFER: if (node->buf) io_buffer_unmap(ctx, node); + if (io_alloc_cache_put(&ctx->buf_table.node_cache, node)) + return; break; default: WARN_ON_ONCE(1); @@ -528,7 +584,7 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, goto fail; } ret = -ENOMEM; - node = io_rsrc_node_alloc(IORING_RSRC_FILE); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); if (!node) { fput(file); goto fail; @@ -548,11 +604,19 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, return ret; } +static void io_rsrc_buffer_free(struct io_ring_ctx *ctx, + struct io_buf_table *table) +{ + io_rsrc_data_free(ctx, &table->data); + io_alloc_cache_free(&table->node_cache, kfree); + io_alloc_cache_free(&table->imu_cache, kfree); +} + int io_sqe_buffers_unregister(struct io_ring_ctx *ctx) { if (!ctx->buf_table.data.nr) return -ENXIO; - io_rsrc_data_free(ctx, &ctx->buf_table.data); + io_rsrc_buffer_free(ctx, &ctx->buf_table); return 0; } @@ -733,7 +797,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, if (!iov->iov_base) return NULL; - node = io_rsrc_node_alloc(IORING_RSRC_BUFFER); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER); if (!node) return ERR_PTR(-ENOMEM); node->buf = NULL; @@ -753,7 +817,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, coalesced = io_coalesce_buffer(&pages, &nr_pages, &data); } - imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL); + imu = io_alloc_imu(ctx, nr_pages); if (!imu) goto done; @@ -789,7 +853,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, } done: if (ret) { - kvfree(imu); + io_free_imu(ctx, imu); if (node) io_put_rsrc_node(ctx, node); node = ERR_PTR(ret); @@ -802,9 +866,9 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, unsigned int nr_args, u64 __user *tags) { struct page *last_hpage = NULL; - struct io_rsrc_data data; struct iovec fast_iov, *iov = &fast_iov; const struct iovec __user *uvec; + struct io_buf_table table; int i, ret; BUILD_BUG_ON(IORING_MAX_REG_BUFFERS >= (1u << 16)); @@ -813,13 +877,14 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, return -EBUSY; if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS) return -EINVAL; - ret = io_rsrc_data_alloc(&data, nr_args); + ret = io_rsrc_buffer_alloc(&table, nr_args); if (ret) return ret; if (!arg) memset(iov, 0, sizeof(*iov)); + ctx->buf_table = table; for (i = 0; i < nr_args; i++) { struct io_rsrc_node *node; u64 tag = 0; @@ -859,10 +924,8 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, } node->tag = tag; } - data.nodes[i] = node; + table.data.nodes[i] = node; } - - ctx->buf_table.data = data; if (ret) io_sqe_buffers_unregister(ctx); return ret; @@ -894,14 +957,15 @@ int io_buffer_register_bvec(struct io_uring_cmd *cmd, struct request *rq, goto unlock; } - node = io_rsrc_node_alloc(IORING_RSRC_BUFFER); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER); if (!node) { ret = -ENOMEM; goto unlock; } nr_bvecs = blk_rq_nr_phys_segments(rq); - imu = kvmalloc(struct_size(imu, bvec, nr_bvecs), GFP_KERNEL); + + imu = io_alloc_imu(ctx, nr_bvecs); if (!imu) { kfree(node); ret = -ENOMEM; @@ -1067,7 +1131,7 @@ static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2) static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx, struct io_uring_clone_buffers *arg) { - struct io_rsrc_data data; + struct io_buf_table table; int i, ret, off, nr; unsigned int nbufs; @@ -1098,7 +1162,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (check_add_overflow(arg->nr, arg->dst_off, &nbufs)) return -EOVERFLOW; - ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.data.nr)); + ret = io_rsrc_buffer_alloc(&table, max(nbufs, ctx->buf_table.data.nr)); if (ret) return ret; @@ -1107,7 +1171,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx struct io_rsrc_node *src_node = ctx->buf_table.data.nodes[i]; if (src_node) { - data.nodes[i] = src_node; + table.data.nodes[i] = src_node; src_node->refs++; } } @@ -1137,7 +1201,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (!src_node) { dst_node = NULL; } else { - dst_node = io_rsrc_node_alloc(IORING_RSRC_BUFFER); + dst_node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER); if (!dst_node) { ret = -ENOMEM; goto out_free; @@ -1146,12 +1210,12 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx refcount_inc(&src_node->buf->refs); dst_node->buf = src_node->buf; } - data.nodes[off++] = dst_node; + table.data.nodes[off++] = dst_node; i++; } /* - * If asked for replace, put the old table. data->nodes[] holds both + * If asked for replace, put the old table. table.data->nodes[] holds both * old and new nodes at this point. */ if (arg->flags & IORING_REGISTER_DST_REPLACE) @@ -1164,10 +1228,10 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx * entry). */ WARN_ON_ONCE(ctx->buf_table.data.nr); - ctx->buf_table.data = data; + ctx->buf_table = table; return 0; out_free: - io_rsrc_data_free(ctx, &data); + io_rsrc_buffer_free(ctx, &table); return ret; } diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 64bf35667cf9c..92dd78be9546d 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -47,7 +47,7 @@ struct io_imu_folio_data { unsigned int nr_folios; }; -struct io_rsrc_node *io_rsrc_node_alloc(int type); +struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type); void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node); void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data); int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr);