From patchwork Tue Feb 11 00:56:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13968715 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 71D1C7B3E1 for ; Tue, 11 Feb 2025 00:57:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235442; cv=none; b=Mrx9nZr+AwTQzHdtJkgX00xgHIVVV8XmH3PvkIEMLgn3l4qROfpQlSmGjIFgPNiWH8Olb6YWryfJT+feJPl3Pdtozg00OR8tShuQT5+FDkMbi+msJcqmzU5c6k7yrirl/vLZaUvKhTHn2CsIFasyUFYUIb/uaGg81EQjOAVN4uU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235442; c=relaxed/simple; bh=nukutWW51oSk/fRTc20Hqo5GtL2WEygpdwp+WbbFKb8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pEvKEJ/ckCnvYV9URjn73tn7nZwCwHynkg3Ldvs5iWJY77uG5iqnXbd+gKEwIbwKWVVoyCjcCQwtK/A494VSxvtVog96KFl/LOMF+WMepXOm0knkb3F6KaSht68ekh2oRGYHyoKsefFlJvRQaa3j/jPUk/M70YZrUJKlo67g0AM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=VhUJjR0Q; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="VhUJjR0Q" Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51B0oXGQ027408 for ; Mon, 10 Feb 2025 16:57:18 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=jLLIHfW+mKONINFmqcybC+5D/sdcEJrut1FJWmazW78=; b=VhUJjR0QnFtF YIY14Anb5n6ak9y8e6NGHPIDmbv+QzbqsSG+QW2x3sHaZBLSEUQaeeAId+TaWjim GkjMz/L5PAl4B+4qR09stiij443PhO+8hfrep3VTuihxVnoRrux9vG9cluOVv9nv xd1WI0t8+QyUKLGl1vmaMB2pYuJo8aSDhdNGbz4ZkZAlSoGxtqtcD3hjxhr+5Yi2 MyMBdbYC1iWmwVcaVGveh44xn/ZmoBaRTUJ49YzN3qHN0jvjMcLC2b6Kt9j7N+jp 2TEi/uOQsIBluU2ZMwYoWvil1EF2VDnUyAPNyF16nZ3yosI+pE9OJMVJCCxsWPTR 0TN+H4tkvA== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 44qpm92u3k-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 10 Feb 2025 16:57:18 -0800 (PST) Received: from twshared9216.15.frc2.facebook.com (2620:10d:c0a8:1c::11) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 11 Feb 2025 00:57:01 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 7AFE117E18F7E; Mon, 10 Feb 2025 16:56:48 -0800 (PST) From: Keith Busch To: , , , , CC: , Keith Busch Subject: [PATCHv2 1/6] io_uring: use node for import Date: Mon, 10 Feb 2025 16:56:41 -0800 Message-ID: <20250211005646.222452-2-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250211005646.222452-1-kbusch@meta.com> References: <20250211005646.222452-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 5L7RDIj_0wO6TGpq_nHzgRBKT9FLlPLT X-Proofpoint-ORIG-GUID: 5L7RDIj_0wO6TGpq_nHzgRBKT9FLlPLT X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-11_01,2025-02-10_01,2024-11-22_01 From: Jens Axboe Replace the mapped buffer to the parent node. This is preparing for a future for different types with specific handling considerations. Signed-off-by: Keith Busch --- io_uring/net.c | 3 +-- io_uring/rsrc.c | 6 +++--- io_uring/rsrc.h | 5 ++--- io_uring/rw.c | 2 +- io_uring/uring_cmd.c | 2 +- 5 files changed, 8 insertions(+), 10 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 10344b3a6d89c..280d576e89249 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1377,8 +1377,7 @@ static int io_send_zc_import(struct io_kiocb *req, unsigned int issue_flags) return ret; ret = io_import_fixed(ITER_SOURCE, &kmsg->msg.msg_iter, - node->buf, (u64)(uintptr_t)sr->buf, - sr->len); + node, (u64)(uintptr_t)sr->buf, sr->len); if (unlikely(ret)) return ret; kmsg->msg.sg_from_iter = io_sg_from_iter; diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index af39b69eb4fde..4d0e1c06c8bc6 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -860,10 +860,10 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, return ret; } -int io_import_fixed(int ddir, struct iov_iter *iter, - struct io_mapped_ubuf *imu, - u64 buf_addr, size_t len) +int io_import_fixed(int ddir, struct iov_iter *iter, struct io_rsrc_node *node, + u64 buf_addr, size_t len) { + struct io_mapped_ubuf *imu = node->buf; u64 buf_end; size_t offset; diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 190f7ee45de93..abd0d5d42c3e1 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -50,9 +50,8 @@ void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node); void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data); int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr); -int io_import_fixed(int ddir, struct iov_iter *iter, - struct io_mapped_ubuf *imu, - u64 buf_addr, size_t len); +int io_import_fixed(int ddir, struct iov_iter *iter, struct io_rsrc_node *node, + u64 buf_addr, size_t len); int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg); int io_sqe_buffers_unregister(struct io_ring_ctx *ctx); diff --git a/io_uring/rw.c b/io_uring/rw.c index 7aa1e4c9f64a3..c25e0ab5c996b 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -369,7 +369,7 @@ static int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe io_req_assign_buf_node(req, node); io = req->async_data; - ret = io_import_fixed(ddir, &io->iter, node->buf, rw->addr, rw->len); + ret = io_import_fixed(ddir, &io->iter, node, rw->addr, rw->len); iov_iter_save_state(&io->iter, &io->iter_state); return ret; } diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index 1f6a82128b475..aebbe2a4c7183 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -274,7 +274,7 @@ int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, /* Must have had rsrc_node assigned at prep time */ if (node) - return io_import_fixed(rw, iter, node->buf, ubuf, len); + return io_import_fixed(rw, iter, node, ubuf, len); return -EFAULT; } From patchwork Tue Feb 11 00:56:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13968716 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6B1A8154BE0 for ; Tue, 11 Feb 2025 00:57:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235442; cv=none; b=ZSyaA9qGYQnRz60leUws8nB3CUAJYObQncIXhy+in+5hKrvdoq4sWkIq7kfjxxS7SrpOcJt7jN2ihViOzyuX4UtxF0Rng9DjycUeD2LlxEhnjjjmAZNBz47I5/elVlOEqxtVMvpvZpckJf6Nd4jY1EmG4YOQnl+YpdXGcTCRt9E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235442; c=relaxed/simple; bh=jbfXTxv7PoInJ1c76O3S84c82poazd/r/Env8Cx5+jM=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=HlWQdXHlIIKBW/dNhk2oEokxsj7tkWBFcPxI/linHELVs1/tYvDcVEjGJF9bq7ONW2mjgsCAN9+qGIXRRkj7RLEJ1OAY03kldSmK9SzPAq0rjWh9jwmUSr86jJwav9a4Xz4ahn6EQfSXkcvjs/zU+oXLWTSSe6gODV660867EOs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=VTok1NaU; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="VTok1NaU" Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51B0oXGU027408 for ; Mon, 10 Feb 2025 16:57:21 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=rcrYmVDTZiDimYfKzmas6f3pvowE1zX2hEcw8Tz79WM=; b=VTok1NaUh7VY ahMhacfTwCmsHuymqoBfQMnhFAdWz6N0Vt+69Rj4+PTJTUhuaBqfn33ILJP2oVi+ b16+8jdY1M4Jw4hTpuHZwU6n61lO27EhnkJ4Avx82ffks5FzyxeRYNyEdxHPTcFO PlpsHy9m+qcR76RydnKfqWfVGFokM5dcmD8boVWPu7gSY7O1TWN3rRUCY2/XYUme JRo7S0ZPW24Mkhpfui8jTmENEsGE1SoJhEoEikql3YiFBxFGfG/YMbsy61WXwen9 Cfnj0lJXZfdCWSgetqy2GBvJTUHJ0TPGgSjDqoLrRymdLoogrvjBDm5U+IhQIwwP mtJpOFz2zg== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 44qpm92u3k-8 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 10 Feb 2025 16:57:20 -0800 (PST) Received: from twshared3076.40.frc1.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 11 Feb 2025 00:57:02 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 893A117E18F80; Mon, 10 Feb 2025 16:56:48 -0800 (PST) From: Keith Busch To: , , , , CC: , Keith Busch Subject: [PATCHv2 2/6] io_uring: create resource release callback Date: Mon, 10 Feb 2025 16:56:42 -0800 Message-ID: <20250211005646.222452-3-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250211005646.222452-1-kbusch@meta.com> References: <20250211005646.222452-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: hXgdQS-AVLPThX0P56NfGzqysGuH3qSE X-Proofpoint-ORIG-GUID: hXgdQS-AVLPThX0P56NfGzqysGuH3qSE X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-11_01,2025-02-10_01,2024-11-22_01 From: Keith Busch When a registered resource is about to be freed, check if it has registered a release function, and call it if so. This is preparing for resources that are related to an external component. Signed-off-by: Keith Busch --- io_uring/rsrc.c | 2 ++ io_uring/rsrc.h | 3 +++ 2 files changed, 5 insertions(+) diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 4d0e1c06c8bc6..30f08cf13ef60 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -455,6 +455,8 @@ void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node) case IORING_RSRC_BUFFER: if (node->buf) io_buffer_unmap(ctx, node); + if (node->release) + node->release(node->priv); break; default: WARN_ON_ONCE(1); diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index abd0d5d42c3e1..a3826ab84e666 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -24,6 +24,9 @@ struct io_rsrc_node { unsigned long file_ptr; struct io_mapped_ubuf *buf; }; + + void (*release)(void *); + void *priv; }; struct io_mapped_ubuf { From patchwork Tue Feb 11 00:56:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13968718 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D50A1AF0AF for ; Tue, 11 Feb 2025 00:57:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235445; cv=none; b=B0BJQPddWNpKFNm3vIho6i3Q2GlHV/4iprbiLqTUAPWQRTxSOq1AFoyuSR4b5yJkd/S8eD5hnEGK2Z2kqFDUkqoTyENxIZZvknMK3UwSnRp4U1YXFi0SLsIPPD/A9fmZdTwTV7W0Ul5oHg1S4pzaAUXme4KGkbg3lS6De+52QgU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235445; c=relaxed/simple; bh=lMan0NY60W7Qk75Q1M2IOUcc64e8qyCVkjp55PH55Qc=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=H0J86OWxsXEAREi30Hxrg020dp4DZXRfGtYHo517sooN/KfREl/VA1jHdWxxEkC9oRZtiJZOB05uWtv29lw2Hau4rH96GXUcYLglptuN13XAn+z8sowW9fkAb2lN5KhvBS1gPtUWIka6iQMfdHvYGKndwJd5LaZC48lq1XaXllM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=Ln69X+4t; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="Ln69X+4t" Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51B0oXGX027408 for ; Mon, 10 Feb 2025 16:57:22 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=Sr6NkfMFcqzBDW1Q2ijpBNbY7WcpN+fJ9t7PtRC1+S4=; b=Ln69X+4tYZkG YKEYo2gtS9LOGAeeT56BkZxdVMllmGoHFGh3FKNZ79KotJK8lmEjGWZRFqqff4QB XXzISU/MQXos+zjRPfWcvDbiXf7PnTigozy7kKDCZiG5tJHGEdURED1LNPk8h31O 5JLKLu7TLvByazGBHPqtlA/evqoXjQ5NogUoecabDl8oSilpQq8FcvFKJ95Br9L8 VsZdyJMxBwVhWLOjN4Hx0SWVnvsi68BQhhacEaanCXa5cR40U+Cx7DLckddFIpHa CgYDbHehlIsQ1MllAdnPU9u7u65HaGie18OX7c/fyZ7mRgFo5kp1ieS5LvuzZ2It mZSxn2fFOA== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 44qpm92u3k-11 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 10 Feb 2025 16:57:22 -0800 (PST) Received: from twshared3076.40.frc1.facebook.com (2620:10d:c0a8:1c::11) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 11 Feb 2025 00:57:02 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 9157A17E18F82; Mon, 10 Feb 2025 16:56:48 -0800 (PST) From: Keith Busch To: , , , , CC: , Keith Busch Subject: [PATCHv2 3/6] io_uring: add support for kernel registered bvecs Date: Mon, 10 Feb 2025 16:56:43 -0800 Message-ID: <20250211005646.222452-4-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250211005646.222452-1-kbusch@meta.com> References: <20250211005646.222452-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: 6T-oxVMw-mt0ZoAj4hPaKA24wiKpa_Xp X-Proofpoint-ORIG-GUID: 6T-oxVMw-mt0ZoAj4hPaKA24wiKpa_Xp X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-11_01,2025-02-10_01,2024-11-22_01 From: Keith Busch Provide an interface for the kernel to leverage the existing pre-registered buffers that io_uring provides. User space can reference these later to achieve zero-copy IO. User space must register an empty fixed buffer table with io_uring in order for the kernel to make use of it. Signed-off-by: Keith Busch --- include/linux/io_uring.h | 1 + include/linux/io_uring_types.h | 4 ++ io_uring/rsrc.c | 100 +++++++++++++++++++++++++++++++-- io_uring/rsrc.h | 1 + 4 files changed, 100 insertions(+), 6 deletions(-) diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 85fe4e6b275c7..b5637a2aae340 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -5,6 +5,7 @@ #include #include #include +#include #if defined(CONFIG_IO_URING) void __io_uring_cancel(bool cancel_all); diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index e2fef264ff8b8..99aac2d52fbae 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -693,4 +693,8 @@ static inline bool io_ctx_cqe32(struct io_ring_ctx *ctx) return ctx->flags & IORING_SETUP_CQE32; } +int io_buffer_register_bvec(struct io_ring_ctx *ctx, struct request *rq, + void (*release)(void *), unsigned int index); +void io_buffer_unregister_bvec(struct io_ring_ctx *ctx, unsigned int tag); + #endif diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 30f08cf13ef60..14efec8587888 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -110,8 +110,9 @@ static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node) if (!refcount_dec_and_test(&imu->refs)) return; - for (i = 0; i < imu->nr_bvecs; i++) - unpin_user_page(imu->bvec[i].bv_page); + if (node->type == IORING_RSRC_BUFFER) + for (i = 0; i < imu->nr_bvecs; i++) + unpin_user_page(imu->bvec[i].bv_page); if (imu->acct_pages) io_unaccount_mem(ctx, imu->acct_pages); kvfree(imu); @@ -240,6 +241,13 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, struct io_rsrc_node *node; u64 tag = 0; + i = array_index_nospec(up->offset + done, ctx->buf_table.nr); + node = io_rsrc_node_lookup(&ctx->buf_table, i); + if (node && node->type != IORING_RSRC_BUFFER) { + err = -EBUSY; + break; + } + uvec = u64_to_user_ptr(user_data); iov = iovec_from_user(uvec, 1, 1, &fast_iov, ctx->compat); if (IS_ERR(iov)) { @@ -265,7 +273,6 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, } node->tag = tag; } - i = array_index_nospec(up->offset + done, ctx->buf_table.nr); io_reset_rsrc_node(ctx, &ctx->buf_table, i); ctx->buf_table.nodes[i] = node; if (ctx->compat) @@ -452,6 +459,7 @@ void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node) if (io_slot_file(node)) fput(io_slot_file(node)); break; + case IORING_RSRC_KBUFFER: case IORING_RSRC_BUFFER: if (node->buf) io_buffer_unmap(ctx, node); @@ -862,6 +870,79 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, return ret; } +int io_buffer_register_bvec(struct io_ring_ctx *ctx, struct request *rq, + void (*release)(void *), unsigned int index) +{ + struct io_rsrc_data *data = &ctx->buf_table; + struct req_iterator rq_iter; + struct io_mapped_ubuf *imu; + struct io_rsrc_node *node; + struct bio_vec bv; + u16 nr_bvecs; + int i = 0; + + lockdep_assert_held(&ctx->uring_lock); + + if (!data->nr) + return -EINVAL; + if (index >= data->nr) + return -EINVAL; + + node = data->nodes[index]; + if (node) + return -EBUSY; + + node = io_rsrc_node_alloc(IORING_RSRC_KBUFFER); + if (!node) + return -ENOMEM; + + node->release = release; + node->priv = rq; + + nr_bvecs = blk_rq_nr_phys_segments(rq); + imu = kvmalloc(struct_size(imu, bvec, nr_bvecs), GFP_KERNEL); + if (!imu) { + kfree(node); + return -ENOMEM; + } + + imu->ubuf = 0; + imu->len = blk_rq_bytes(rq); + imu->acct_pages = 0; + imu->nr_bvecs = nr_bvecs; + refcount_set(&imu->refs, 1); + node->buf = imu; + + rq_for_each_bvec(bv, rq, rq_iter) + bvec_set_page(&node->buf->bvec[i++], bv.bv_page, bv.bv_len, + bv.bv_offset); + data->nodes[index] = node; + + return 0; +} +EXPORT_SYMBOL_GPL(io_buffer_register_bvec); + +void io_buffer_unregister_bvec(struct io_ring_ctx *ctx, unsigned int index) +{ + struct io_rsrc_data *data = &ctx->buf_table; + struct io_rsrc_node *node; + + lockdep_assert_held(&ctx->uring_lock); + + if (!data->nr) + return; + if (index >= data->nr) + return; + + node = data->nodes[index]; + if (!node || !node->buf) + return; + if (node->type != IORING_RSRC_KBUFFER) + return; + io_reset_rsrc_node(ctx, data, index); +} +EXPORT_SYMBOL_GPL(io_buffer_unregister_bvec); + int io_import_fixed(int ddir, struct iov_iter *iter, struct io_rsrc_node *node, u64 buf_addr, size_t len) { @@ -888,8 +969,8 @@ int io_import_fixed(int ddir, struct iov_iter *iter, struct io_rsrc_node *node, /* * Don't use iov_iter_advance() here, as it's really slow for * using the latter parts of a big fixed buffer - it iterates - * over each segment manually. We can cheat a bit here, because - * we know that: + * over each segment manually. We can cheat a bit here for user + * registered nodes, because we know that: * * 1) it's a BVEC iter, we set it up * 2) all bvecs are the same in size, except potentially the @@ -903,8 +984,15 @@ int io_import_fixed(int ddir, struct iov_iter *iter, struct io_rsrc_node *node, */ const struct bio_vec *bvec = imu->bvec; + /* + * Kernel buffer bvecs, on the other hand, don't necessarily + * have the size property of user registered ones, so we have + * to use the slow iter advance. + */ if (offset < bvec->bv_len) { iter->iov_offset = offset; + } else if (node->type == IORING_RSRC_KBUFFER) { + iov_iter_advance(iter, offset); } else { unsigned long seg_skip; @@ -1004,7 +1092,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (!src_node) { dst_node = NULL; } else { - dst_node = io_rsrc_node_alloc(IORING_RSRC_BUFFER); + dst_node = io_rsrc_node_alloc(src_node->type); if (!dst_node) { ret = -ENOMEM; goto out_free; diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index a3826ab84e666..8147dfc26f737 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -13,6 +13,7 @@ enum { IORING_RSRC_FILE = 0, IORING_RSRC_BUFFER = 1, + IORING_RSRC_KBUFFER = 2, }; struct io_rsrc_node { From patchwork Tue Feb 11 00:56:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13968713 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CBA8614F6C for ; Tue, 11 Feb 2025 00:57:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235431; cv=none; b=R5x7oaSPpt6nIXOR7WsHZ7Zxx7K74UnDA+ODkxKel0oXIABoDRKMKU14pV9ebyZo5mNSgAvcEw8hUSuTXXsZ8SJRZL498wl89Cq9dBg3c1LDo7CY6qIu3X6pZ67d2h0ppmfP75TTiYEwHzfGhZvOSJS+40akqKa7/7ClIRUJcTI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235431; c=relaxed/simple; bh=Epb9iANNzAGYgCSygylM0b/HrE4aeGiWN8sD0SJmbSY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=pc+tzaJi3xRUFPdXKDKF07Jwr9Yv7N0eqiisA3mqrNH01HemZ5tHS0dlM46Rgp5F9QN/TDpxosOho4K0meidbhzD3DLg/YahBo2QPtonJgyCHip/3aJ3UMMr5ZZsQZbavZSseqVVRqx0ASJWmaubtAwnG/y3u5oWZCO9oV+/GS4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=kcEk/06s; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="kcEk/06s" Received: from pps.filterd (m0109333.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51B0ogKV026613 for ; Mon, 10 Feb 2025 16:57:09 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=lE+OuzFjg4CSLxMvImahw2NavMOxNOIXkPxNAPEC/OY=; b=kcEk/06sSX+x 4rilcVf5ym3x9KrPmImasoQLV6L0jSibYGgCTQ+7NASL0Qd/QkXHqJcwygTsmiu2 TPPQNR5t2jNiljZ6WXlcRPyYNMK6oI/lTLbWYvHwbBHtCrMLB0gspS7852s6tM83 nRxH0aRdlPcQY7fREuawqNx2KehlqEmSNp3eBTHqV0q1TepwaPWDJlvjP14dQoGX s/zt8GnPzjRbZLPLJVdivPAcCrZYz3s9iLPklRpifX/6ojs810yd+XZ8F/tKgmyB 02a528P03i+y7oiAUSs7eia1pgCo7nDm/IFoZUa3+9kMOV4N5PN5eL7a6RbHW3v6 HK94/nK5VQ== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 44qqg2jc01-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 10 Feb 2025 16:57:08 -0800 (PST) Received: from twshared9216.15.frc2.facebook.com (2620:10d:c0a8:1b::2d) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 11 Feb 2025 00:57:01 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 9772F17E18F83; Mon, 10 Feb 2025 16:56:48 -0800 (PST) From: Keith Busch To: , , , , CC: , Keith Busch Subject: [PATCHv2 4/6] ublk: zc register/unregister bvec Date: Mon, 10 Feb 2025 16:56:44 -0800 Message-ID: <20250211005646.222452-5-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250211005646.222452-1-kbusch@meta.com> References: <20250211005646.222452-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: Ht3u0IM2iub7__uQ5OxhgKOTr_vn22ss X-Proofpoint-ORIG-GUID: Ht3u0IM2iub7__uQ5OxhgKOTr_vn22ss X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-10_12,2025-02-10_01,2024-11-22_01 From: Keith Busch Provide new operations for the user to request mapping an active request to an io uring instance's buf_table. The user has to provide the index it wants to install the buffer. A reference count is taken on the request to ensure it can't be completed while it is active in a ring's buf_table. Signed-off-by: Keith Busch --- drivers/block/ublk_drv.c | 145 +++++++++++++++++++++++++--------- include/uapi/linux/ublk_cmd.h | 4 + 2 files changed, 113 insertions(+), 36 deletions(-) diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index 529085181f355..ccfda7b2c24da 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -51,6 +51,9 @@ /* private ioctl command mirror */ #define UBLK_CMD_DEL_DEV_ASYNC _IOC_NR(UBLK_U_CMD_DEL_DEV_ASYNC) +#define UBLK_IO_REGISTER_IO_BUF _IOC_NR(UBLK_U_IO_REGISTER_IO_BUF) +#define UBLK_IO_UNREGISTER_IO_BUF _IOC_NR(UBLK_U_IO_UNREGISTER_IO_BUF) + /* All UBLK_F_* have to be included into UBLK_F_ALL */ #define UBLK_F_ALL (UBLK_F_SUPPORT_ZERO_COPY \ | UBLK_F_URING_CMD_COMP_IN_TASK \ @@ -76,6 +79,9 @@ struct ublk_rq_data { struct llist_node node; struct kref ref; + +#define UBLK_ZC_REGISTERED 0 + unsigned long flags; }; struct ublk_uring_cmd_pdu { @@ -201,7 +207,7 @@ static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq, int tag); static inline bool ublk_dev_is_user_copy(const struct ublk_device *ub) { - return ub->dev_info.flags & UBLK_F_USER_COPY; + return ub->dev_info.flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY); } static inline bool ublk_dev_is_zoned(const struct ublk_device *ub) @@ -581,7 +587,7 @@ static void ublk_apply_params(struct ublk_device *ub) static inline bool ublk_support_user_copy(const struct ublk_queue *ubq) { - return ubq->flags & UBLK_F_USER_COPY; + return ubq->flags & (UBLK_F_USER_COPY | UBLK_F_SUPPORT_ZERO_COPY); } static inline bool ublk_need_req_ref(const struct ublk_queue *ubq) @@ -1747,6 +1753,102 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd, io_uring_cmd_mark_cancelable(cmd, issue_flags); } +static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, + struct ublk_queue *ubq, int tag, size_t offset) +{ + struct request *req; + + if (!ublk_need_req_ref(ubq)) + return NULL; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req) + return NULL; + + if (!ublk_get_req_ref(ubq, req)) + return NULL; + + if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) + goto fail_put; + + if (!ublk_rq_has_data(req)) + goto fail_put; + + if (offset > blk_rq_bytes(req)) + goto fail_put; + + return req; +fail_put: + ublk_put_req_ref(ubq, req); + return NULL; +} + +static void ublk_io_release(void *priv) +{ + struct request *rq = priv; + struct ublk_queue *ubq = rq->mq_hctx->driver_data; + + ublk_put_req_ref(ubq, rq); +} + +static int ublk_register_io_buf(struct io_uring_cmd *cmd, + struct ublk_queue *ubq, int tag, + const struct ublksrv_io_cmd *ub_cmd) +{ + struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx; + struct ublk_device *ub = cmd->file->private_data; + int index = (int)ub_cmd->addr, ret; + struct ublk_rq_data *data; + struct request *req; + + if (!ub) + return -EPERM; + + req = __ublk_check_and_get_req(ub, ubq, tag, 0); + if (!req) + return -EINVAL; + + data = blk_mq_rq_to_pdu(req); + if (test_and_set_bit(UBLK_ZC_REGISTERED, &data->flags)) { + ublk_put_req_ref(ubq, req); + return -EBUSY; + } + + ret = io_buffer_register_bvec(ctx, req, ublk_io_release, index); + if (ret) { + clear_bit(UBLK_ZC_REGISTERED, &data->flags); + ublk_put_req_ref(ubq, req); + return ret; + } + + return 0; +} + +static int ublk_unregister_io_buf(struct io_uring_cmd *cmd, + struct ublk_queue *ubq, int tag, + const struct ublksrv_io_cmd *ub_cmd) +{ + struct io_ring_ctx *ctx = cmd_to_io_kiocb(cmd)->ctx; + struct ublk_device *ub = cmd->file->private_data; + int index = (int)ub_cmd->addr; + struct ublk_rq_data *data; + struct request *req; + + if (!ub) + return -EPERM; + + req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); + if (!req) + return -EINVAL; + + data = blk_mq_rq_to_pdu(req); + if (!test_and_clear_bit(UBLK_ZC_REGISTERED, &data->flags)) + return -EINVAL; + + io_buffer_unregister_bvec(ctx, index); + return 0; +} + static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags, const struct ublksrv_io_cmd *ub_cmd) @@ -1798,6 +1900,10 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, ret = -EINVAL; switch (_IOC_NR(cmd_op)) { + case UBLK_IO_REGISTER_IO_BUF: + return ublk_register_io_buf(cmd, ubq, tag, ub_cmd); + case UBLK_IO_UNREGISTER_IO_BUF: + return ublk_unregister_io_buf(cmd, ubq, tag, ub_cmd); case UBLK_IO_FETCH_REQ: /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */ if (ublk_queue_ready(ubq)) { @@ -1872,36 +1978,6 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd, return -EIOCBQUEUED; } -static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub, - struct ublk_queue *ubq, int tag, size_t offset) -{ - struct request *req; - - if (!ublk_need_req_ref(ubq)) - return NULL; - - req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag); - if (!req) - return NULL; - - if (!ublk_get_req_ref(ubq, req)) - return NULL; - - if (unlikely(!blk_mq_request_started(req) || req->tag != tag)) - goto fail_put; - - if (!ublk_rq_has_data(req)) - goto fail_put; - - if (offset > blk_rq_bytes(req)) - goto fail_put; - - return req; -fail_put: - ublk_put_req_ref(ubq, req); - return NULL; -} - static inline int ublk_ch_uring_cmd_local(struct io_uring_cmd *cmd, unsigned int issue_flags) { @@ -2527,9 +2603,6 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd) goto out_free_dev_number; } - /* We are not ready to support zero copy */ - ub->dev_info.flags &= ~UBLK_F_SUPPORT_ZERO_COPY; - ub->dev_info.nr_hw_queues = min_t(unsigned int, ub->dev_info.nr_hw_queues, nr_cpu_ids); ublk_align_max_io_size(ub); @@ -2860,7 +2933,7 @@ static int ublk_ctrl_get_features(struct io_uring_cmd *cmd) { const struct ublksrv_ctrl_cmd *header = io_uring_sqe_cmd(cmd->sqe); void __user *argp = (void __user *)(unsigned long)header->addr; - u64 features = UBLK_F_ALL & ~UBLK_F_SUPPORT_ZERO_COPY; + u64 features = UBLK_F_ALL; if (header->len != UBLK_FEATURES_LEN || !header->addr) return -EINVAL; diff --git a/include/uapi/linux/ublk_cmd.h b/include/uapi/linux/ublk_cmd.h index a8bc98bb69fce..74246c926b55f 100644 --- a/include/uapi/linux/ublk_cmd.h +++ b/include/uapi/linux/ublk_cmd.h @@ -94,6 +94,10 @@ _IOWR('u', UBLK_IO_COMMIT_AND_FETCH_REQ, struct ublksrv_io_cmd) #define UBLK_U_IO_NEED_GET_DATA \ _IOWR('u', UBLK_IO_NEED_GET_DATA, struct ublksrv_io_cmd) +#define UBLK_U_IO_REGISTER_IO_BUF \ + _IOWR('u', 0x23, struct ublksrv_io_cmd) +#define UBLK_U_IO_UNREGISTER_IO_BUF \ + _IOWR('u', 0x24, struct ublksrv_io_cmd) /* only ABORT means that no re-fetch */ #define UBLK_IO_RES_OK 0 From patchwork Tue Feb 11 00:56:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13968712 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DAB1C524F for ; Tue, 11 Feb 2025 00:57:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235428; cv=none; b=luJonmOVQY0Qf/3MaKH4xH0iV/zxjtfDs+8Yjt0LsOY28HeMq1GVE6pmTvkA5Z062o7nGuhgo6mUW5+MMzsyAbKE++POWbniE16n80IP8Hy6cue2O0OxXGQraPF6bb3Upfi2PJZSeKKbxI0iYlF5zm6W+T3ixiZdktwpxNMvPCo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235428; c=relaxed/simple; bh=aDx84VkQIL5gCLLyhuqIDrykvPd3HZ/Lifqgzd+YN1E=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=FmqM8w0Ui1EizTX7d3QJ3tAh+uuxOeMHiSoZmHbQw7zcCHXNb44dWvR8QsB2PJiZFCU+tinuqVNkvG4SaiUslNQrdKMeOrm8HMtQFm/5GF4XXeA4ifuc+XByRfCOPmvqYBha9VUYWzKWA/vSoS6zpLk3uvSv/+XXRlAdQqAzfXY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=g6MMfPJn; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="g6MMfPJn" Received: from pps.filterd (m0109331.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51B0r2eZ000380 for ; Mon, 10 Feb 2025 16:57:05 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=tjUaNQokNtVIUdBrkdaK5QxUMaL13Zgxdko+X2qAKTY=; b=g6MMfPJnL1nv oQLwuxUcyVXwGrRUmPwm4vYY9YtibIFqccjw64MlgiE44cRT5qVj7MMOz0pK2lnO 137/zyq+vpliyINr0ljHEtEE+nNt6yrAcFSq3EivPwnVMTFkhYxVcfJrNyOmI1zQ N8vQFhX19TMoOvlexQqvJy0VKx8227dkZptVVNJZXCH30LCr5LEPX9erb2mjcLkz MEriRgs6oXhd9405TwTOe2YdO9l6jXwwsnl/sNHEppAGZV06JjDCNV4ehAHIaPq/ 041q3fQhST3AavHUJ8Ubgb69hLnb1U5XNFO5K8Q1mMwXEeoi+bwb7/oO0lLb4ios bJlrBaUhPQ== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 44qvcf00ra-3 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 10 Feb 2025 16:57:05 -0800 (PST) Received: from twshared11082.06.ash8.facebook.com (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a9:6f::237c) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 11 Feb 2025 00:56:58 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id 9F97717E18F85; Mon, 10 Feb 2025 16:56:48 -0800 (PST) From: Keith Busch To: , , , , CC: , Keith Busch Subject: [PATCHv2 5/6] io_uring: add abstraction for buf_table rsrc data Date: Mon, 10 Feb 2025 16:56:45 -0800 Message-ID: <20250211005646.222452-6-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250211005646.222452-1-kbusch@meta.com> References: <20250211005646.222452-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: kieC_5-thyMmMeKIQCmhgNBs75Atlzks X-Proofpoint-ORIG-GUID: kieC_5-thyMmMeKIQCmhgNBs75Atlzks X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-10_12,2025-02-10_01,2024-11-22_01 From: Keith Busch We'll need to add more fields specific to the registered buffers, so make a layer for it now. No functional change in this patch. Signed-off-by: Keith Busch --- include/linux/io_uring_types.h | 6 +++- io_uring/fdinfo.c | 8 +++--- io_uring/net.c | 2 +- io_uring/nop.c | 2 +- io_uring/register.c | 2 +- io_uring/rsrc.c | 51 +++++++++++++++++----------------- io_uring/rw.c | 2 +- io_uring/uring_cmd.c | 2 +- 8 files changed, 39 insertions(+), 36 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 99aac2d52fbae..4f4b7ad21500d 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -67,6 +67,10 @@ struct io_file_table { unsigned int alloc_hint; }; +struct io_buf_table { + struct io_rsrc_data data; +}; + struct io_hash_bucket { struct hlist_head list; } ____cacheline_aligned_in_smp; @@ -291,7 +295,7 @@ struct io_ring_ctx { struct io_wq_work_list iopoll_list; struct io_file_table file_table; - struct io_rsrc_data buf_table; + struct io_buf_table buf_table; struct io_submit_state submit_state; diff --git a/io_uring/fdinfo.c b/io_uring/fdinfo.c index f60d0a9d505e2..d389c06cbce10 100644 --- a/io_uring/fdinfo.c +++ b/io_uring/fdinfo.c @@ -217,12 +217,12 @@ __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *file) seq_puts(m, "\n"); } } - seq_printf(m, "UserBufs:\t%u\n", ctx->buf_table.nr); - for (i = 0; has_lock && i < ctx->buf_table.nr; i++) { + seq_printf(m, "UserBufs:\t%u\n", ctx->buf_table.data.nr); + for (i = 0; has_lock && i < ctx->buf_table.data.nr; i++) { struct io_mapped_ubuf *buf = NULL; - if (ctx->buf_table.nodes[i]) - buf = ctx->buf_table.nodes[i]->buf; + if (ctx->buf_table.data.nodes[i]) + buf = ctx->buf_table.data.nodes[i]->buf; if (buf) seq_printf(m, "%5u: 0x%llx/%u\n", i, buf->ubuf, buf->len); else diff --git a/io_uring/net.c b/io_uring/net.c index 280d576e89249..c1020c857333d 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -1366,7 +1366,7 @@ static int io_send_zc_import(struct io_kiocb *req, unsigned int issue_flags) ret = -EFAULT; io_ring_submit_lock(ctx, issue_flags); - node = io_rsrc_node_lookup(&ctx->buf_table, sr->buf_index); + node = io_rsrc_node_lookup(&ctx->buf_table.data, sr->buf_index); if (node) { io_req_assign_buf_node(sr->notif, node); ret = 0; diff --git a/io_uring/nop.c b/io_uring/nop.c index 5e5196df650a1..e3ebe5f019076 100644 --- a/io_uring/nop.c +++ b/io_uring/nop.c @@ -69,7 +69,7 @@ int io_nop(struct io_kiocb *req, unsigned int issue_flags) ret = -EFAULT; io_ring_submit_lock(ctx, issue_flags); - node = io_rsrc_node_lookup(&ctx->buf_table, nop->buffer); + node = io_rsrc_node_lookup(&ctx->buf_table.data, nop->buffer); if (node) { io_req_assign_buf_node(req, node); ret = 0; diff --git a/io_uring/register.c b/io_uring/register.c index 9a4d2fbce4aec..fa922b1b26583 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -919,7 +919,7 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, ret = __io_uring_register(ctx, opcode, arg, nr_args); trace_io_uring_register(ctx, opcode, ctx->file_table.data.nr, - ctx->buf_table.nr, ret); + ctx->buf_table.data.nr, ret); mutex_unlock(&ctx->uring_lock); fput(file); diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index 14efec8587888..b3f36f1b2a668 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -232,17 +232,17 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, __u32 done; int i, err; - if (!ctx->buf_table.nr) + if (!ctx->buf_table.data.nr) return -ENXIO; - if (up->offset + nr_args > ctx->buf_table.nr) + if (up->offset + nr_args > ctx->buf_table.data.nr) return -EINVAL; for (done = 0; done < nr_args; done++) { struct io_rsrc_node *node; u64 tag = 0; - i = array_index_nospec(up->offset + done, ctx->buf_table.nr); - node = io_rsrc_node_lookup(&ctx->buf_table, i); + i = array_index_nospec(up->offset + done, ctx->buf_table.data.nr); + node = io_rsrc_node_lookup(&ctx->buf_table.data, i); if (node && node->type != IORING_RSRC_BUFFER) { err = -EBUSY; break; @@ -273,8 +273,8 @@ static int __io_sqe_buffers_update(struct io_ring_ctx *ctx, } node->tag = tag; } - io_reset_rsrc_node(ctx, &ctx->buf_table, i); - ctx->buf_table.nodes[i] = node; + io_reset_rsrc_node(ctx, &ctx->buf_table.data, i); + ctx->buf_table.data.nodes[i] = node; if (ctx->compat) user_data += sizeof(struct compat_iovec); else @@ -555,9 +555,9 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, int io_sqe_buffers_unregister(struct io_ring_ctx *ctx) { - if (!ctx->buf_table.nr) + if (!ctx->buf_table.data.nr) return -ENXIO; - io_rsrc_data_free(ctx, &ctx->buf_table); + io_rsrc_data_free(ctx, &ctx->buf_table.data); return 0; } @@ -584,8 +584,8 @@ static bool headpage_already_acct(struct io_ring_ctx *ctx, struct page **pages, } /* check previously registered pages */ - for (i = 0; i < ctx->buf_table.nr; i++) { - struct io_rsrc_node *node = ctx->buf_table.nodes[i]; + for (i = 0; i < ctx->buf_table.data.nr; i++) { + struct io_rsrc_node *node = ctx->buf_table.data.nodes[i]; struct io_mapped_ubuf *imu; if (!node) @@ -811,7 +811,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, BUILD_BUG_ON(IORING_MAX_REG_BUFFERS >= (1u << 16)); - if (ctx->buf_table.nr) + if (ctx->buf_table.data.nr) return -EBUSY; if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS) return -EINVAL; @@ -864,7 +864,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, data.nodes[i] = node; } - ctx->buf_table = data; + ctx->buf_table.data = data; if (ret) io_sqe_buffers_unregister(ctx); return ret; @@ -873,7 +873,7 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, int io_buffer_register_bvec(struct io_ring_ctx *ctx, struct request *rq, void (*release)(void *), unsigned int index) { - struct io_rsrc_data *data = &ctx->buf_table; + struct io_rsrc_data *data = &ctx->buf_table.data; struct req_iterator rq_iter; struct io_mapped_ubuf *imu; struct io_rsrc_node *node; @@ -924,7 +924,7 @@ EXPORT_SYMBOL_GPL(io_buffer_register_bvec); void io_buffer_unregister_bvec(struct io_ring_ctx *ctx, unsigned int index) { - struct io_rsrc_data *data = &ctx->buf_table; + struct io_rsrc_data *data = &ctx->buf_table.data; struct io_rsrc_node *node; lockdep_assert_held(&ctx->uring_lock); @@ -1040,10 +1040,10 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (!arg->nr && (arg->dst_off || arg->src_off)) return -EINVAL; /* not allowed unless REPLACE is set */ - if (ctx->buf_table.nr && !(arg->flags & IORING_REGISTER_DST_REPLACE)) + if (ctx->buf_table.data.nr && !(arg->flags & IORING_REGISTER_DST_REPLACE)) return -EBUSY; - nbufs = src_ctx->buf_table.nr; + nbufs = src_ctx->buf_table.data.nr; if (!arg->nr) arg->nr = nbufs; else if (arg->nr > nbufs) @@ -1053,13 +1053,13 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (check_add_overflow(arg->nr, arg->dst_off, &nbufs)) return -EOVERFLOW; - ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.nr)); + ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.data.nr)); if (ret) return ret; /* Fill entries in data from dst that won't overlap with src */ - for (i = 0; i < min(arg->dst_off, ctx->buf_table.nr); i++) { - struct io_rsrc_node *src_node = ctx->buf_table.nodes[i]; + for (i = 0; i < min(arg->dst_off, ctx->buf_table.data.nr); i++) { + struct io_rsrc_node *src_node = ctx->buf_table.data.nodes[i]; if (src_node) { data.nodes[i] = src_node; @@ -1068,7 +1068,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx } ret = -ENXIO; - nbufs = src_ctx->buf_table.nr; + nbufs = src_ctx->buf_table.data.nr; if (!nbufs) goto out_free; ret = -EINVAL; @@ -1088,7 +1088,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx while (nr--) { struct io_rsrc_node *dst_node, *src_node; - src_node = io_rsrc_node_lookup(&src_ctx->buf_table, i); + src_node = io_rsrc_node_lookup(&src_ctx->buf_table.data, i); if (!src_node) { dst_node = NULL; } else { @@ -1110,7 +1110,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx * old and new nodes at this point. */ if (arg->flags & IORING_REGISTER_DST_REPLACE) - io_rsrc_data_free(ctx, &ctx->buf_table); + io_sqe_buffers_unregister(ctx); /* * ctx->buf_table must be empty now - either the contents are being @@ -1118,10 +1118,9 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx * copied to a ring that does not have buffers yet (checked at function * entry). */ - WARN_ON_ONCE(ctx->buf_table.nr); - ctx->buf_table = data; + WARN_ON_ONCE(ctx->buf_table.data.nr); + ctx->buf_table.data = data; return 0; - out_free: io_rsrc_data_free(ctx, &data); return ret; @@ -1146,7 +1145,7 @@ int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg) return -EFAULT; if (buf.flags & ~(IORING_REGISTER_SRC_REGISTERED|IORING_REGISTER_DST_REPLACE)) return -EINVAL; - if (!(buf.flags & IORING_REGISTER_DST_REPLACE) && ctx->buf_table.nr) + if (!(buf.flags & IORING_REGISTER_DST_REPLACE) && ctx->buf_table.data.nr) return -EBUSY; if (memchr_inv(buf.pad, 0, sizeof(buf.pad))) return -EINVAL; diff --git a/io_uring/rw.c b/io_uring/rw.c index c25e0ab5c996b..38ec32401a558 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -363,7 +363,7 @@ static int io_prep_rw_fixed(struct io_kiocb *req, const struct io_uring_sqe *sqe if (unlikely(ret)) return ret; - node = io_rsrc_node_lookup(&ctx->buf_table, req->buf_index); + node = io_rsrc_node_lookup(&ctx->buf_table.data, req->buf_index); if (!node) return -EFAULT; io_req_assign_buf_node(req, node); diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c index aebbe2a4c7183..5d9719402b49b 100644 --- a/io_uring/uring_cmd.c +++ b/io_uring/uring_cmd.c @@ -206,7 +206,7 @@ int io_uring_cmd_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) struct io_rsrc_node *node; u16 index = READ_ONCE(sqe->buf_index); - node = io_rsrc_node_lookup(&ctx->buf_table, index); + node = io_rsrc_node_lookup(&ctx->buf_table.data, index); if (unlikely(!node)) return -EFAULT; /* From patchwork Tue Feb 11 00:56:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Keith Busch X-Patchwork-Id: 13968717 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A87E1509A0 for ; Tue, 11 Feb 2025 00:57:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235443; cv=none; b=nfm/Dn9yomzYoY23WeV0APwGZVZKns+/S+d7oMSj+70v5myvDKOClNYKk6QHDwCRKeJuhsZ9oSgQppKt5RfY5v/FwW9z1VEQzEzcN0zLRlMOlxYWxuo2+7p071bJz0qe4ZuK7KQatBsHAy2mFYep2LGwbQtMLmCHOl8tfma9WTs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1739235443; c=relaxed/simple; bh=R6ezdbb1SZGK6TaSG5laRh7MNR6umGNnOH1qxUnqoW8=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IY73rUhsSOKl0qnSdpPheLK1wu7BRzMTYuUQLIuctj526d9JMIrHradWzKnNiobArtBfX2GxcRNwNkCoa7ZWlu8bVOFEejU9/ZmaGa2yZ32CR8mp1aNx9Ekc0/4TWq4DvuHzEXm0UD3YLTan8fkaO/t+oVYaOE0RE+B+Ri8ayaE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b=fqnrrutf; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=meta.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=meta.com header.i=@meta.com header.b="fqnrrutf" Received: from pps.filterd (m0109334.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 51B0oYg0027464 for ; Mon, 10 Feb 2025 16:57:20 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=meta.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2021-q4; bh=DktvP6+nkYX2C0FR/dkgSu1ncTUJ2KmCUq8qXVB1v2o=; b=fqnrrutfegsP yT3m6+UKpB8qNN+ZrrhD0fC1lg1B0dQI0pPDlvr1q3St3ZeLW9ZJ+SC9dBdgFXyI trjaqHU18PFNrP/b7WlTAR7U08lIWZbkuf8BJd0wij8REF8FJJXF1Hqq6J3uiaet bA7uAI6/3LUIi9DNtKv4ZlMLzHevn4oIgU5U0j97nELKdwwiDlHFgpiqA0v1ea/C W91o9GXw8LDB0H9hDML+UDVU94OxwcOplI92xP1sbqPu6DaE0l49u35owRd+d++a JTdcuXJfk4b2P7bqgS3mKdCqZXA4FnATJ/+BQPt+sXl2Wx2nPlU0+oa9OBxcwhE4 gI9NJKbNXw== Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 44qpm92u4s-4 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 10 Feb 2025 16:57:20 -0800 (PST) Received: from twshared3076.40.frc1.facebook.com (2620:10d:c0a8:1c::1b) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.14; Tue, 11 Feb 2025 00:57:02 +0000 Received: by devbig638.nha1.facebook.com (Postfix, from userid 544533) id ABD4917E18F8B; Mon, 10 Feb 2025 16:56:48 -0800 (PST) From: Keith Busch To: , , , , CC: , Keith Busch Subject: [PATCHv2 6/6] io_uring: cache nodes and mapped buffers Date: Mon, 10 Feb 2025 16:56:46 -0800 Message-ID: <20250211005646.222452-7-kbusch@meta.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250211005646.222452-1-kbusch@meta.com> References: <20250211005646.222452-1-kbusch@meta.com> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: Ts0ZrBD-_guTgFUkVG5nr8gy7bc_Q-nq X-Proofpoint-ORIG-GUID: Ts0ZrBD-_guTgFUkVG5nr8gy7bc_Q-nq X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2025-02-11_01,2025-02-10_01,2024-11-22_01 From: Keith Busch Frequent alloc/free cycles on these is pretty costly. Use an io cache to more efficiently reuse these buffers. Signed-off-by: Keith Busch --- include/linux/io_uring_types.h | 18 +++--- io_uring/filetable.c | 2 +- io_uring/rsrc.c | 115 +++++++++++++++++++++++++-------- io_uring/rsrc.h | 2 +- 4 files changed, 101 insertions(+), 36 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 4f4b7ad21500d..a6e525b756d10 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -67,8 +67,18 @@ struct io_file_table { unsigned int alloc_hint; }; +struct io_alloc_cache { + void **entries; + unsigned int nr_cached; + unsigned int max_cached; + size_t elem_size; + unsigned int init_clear; +}; + struct io_buf_table { struct io_rsrc_data data; + struct io_alloc_cache node_cache; + struct io_alloc_cache imu_cache; }; struct io_hash_bucket { @@ -222,14 +232,6 @@ struct io_submit_state { struct blk_plug plug; }; -struct io_alloc_cache { - void **entries; - unsigned int nr_cached; - unsigned int max_cached; - unsigned int elem_size; - unsigned int init_clear; -}; - struct io_ring_ctx { /* const or read-mostly hot data */ struct { diff --git a/io_uring/filetable.c b/io_uring/filetable.c index dd8eeec97acf6..a21660e3145ab 100644 --- a/io_uring/filetable.c +++ b/io_uring/filetable.c @@ -68,7 +68,7 @@ static int io_install_fixed_file(struct io_ring_ctx *ctx, struct file *file, if (slot_index >= ctx->file_table.data.nr) return -EINVAL; - node = io_rsrc_node_alloc(IORING_RSRC_FILE); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); if (!node) return -ENOMEM; diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c index b3f36f1b2a668..88a67590c67d4 100644 --- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -32,6 +32,8 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, #define IORING_MAX_FIXED_FILES (1U << 20) #define IORING_MAX_REG_BUFFERS (1U << 14) +#define IO_CACHED_BVECS_SEGS 30 + int __io_account_mem(struct user_struct *user, unsigned long nr_pages) { unsigned long page_limit, cur_pages, new_pages; @@ -119,19 +121,35 @@ static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node) } } -struct io_rsrc_node *io_rsrc_node_alloc(int type) + +struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type) { struct io_rsrc_node *node; - node = kzalloc(sizeof(*node), GFP_KERNEL); + if (type == IORING_RSRC_FILE) + node = kmalloc(sizeof(*node), GFP_KERNEL); + else + node = io_cache_alloc(&ctx->buf_table.node_cache, GFP_KERNEL); if (node) { node->type = type; node->refs = 1; + node->tag = 0; + node->file_ptr = 0; + node->release = NULL; + node->priv = NULL; } return node; } -__cold void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data) +static __cold void __io_rsrc_data_free(struct io_rsrc_data *data) +{ + kvfree(data->nodes); + data->nodes = NULL; + data->nr = 0; +} + +__cold void io_rsrc_data_free(struct io_ring_ctx *ctx, + struct io_rsrc_data *data) { if (!data->nr) return; @@ -139,9 +157,7 @@ __cold void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data if (data->nodes[data->nr]) io_put_rsrc_node(ctx, data->nodes[data->nr]); } - kvfree(data->nodes); - data->nodes = NULL; - data->nr = 0; + __io_rsrc_data_free(data); } __cold int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr) @@ -155,6 +171,34 @@ __cold int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr) return -ENOMEM; } +static __cold int io_rsrc_buffer_alloc(struct io_buf_table *table, unsigned nr) +{ + const int imu_cache_size = struct_size_t(struct io_mapped_ubuf, bvec, + IO_CACHED_BVECS_SEGS); + int ret; + + BUILD_BUG_ON(imu_cache_size != 512); + ret = io_rsrc_data_alloc(&table->data, nr); + if (ret) + return ret; + + ret = io_alloc_cache_init(&table->node_cache, nr, + sizeof(struct io_rsrc_node), 0); + if (ret) + goto out_1; + + ret = io_alloc_cache_init(&table->imu_cache, nr, imu_cache_size, 0); + if (ret) + goto out_2; + + return 0; +out_2: + io_alloc_cache_free(&table->node_cache, kfree); +out_1: + __io_rsrc_data_free(&table->data); + return ret; +} + static int __io_sqe_files_update(struct io_ring_ctx *ctx, struct io_uring_rsrc_update2 *up, unsigned nr_args) @@ -204,7 +248,7 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx, err = -EBADF; break; } - node = io_rsrc_node_alloc(IORING_RSRC_FILE); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); if (!node) { err = -ENOMEM; fput(file); @@ -465,6 +509,8 @@ void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node) io_buffer_unmap(ctx, node); if (node->release) node->release(node->priv); + if (io_alloc_cache_put(&ctx->buf_table.node_cache, node)) + return; break; default: WARN_ON_ONCE(1); @@ -533,7 +579,7 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, goto fail; } ret = -ENOMEM; - node = io_rsrc_node_alloc(IORING_RSRC_FILE); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_FILE); if (!node) { fput(file); goto fail; @@ -553,11 +599,19 @@ int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg, return ret; } +static void io_rsrc_buffer_free(struct io_ring_ctx *ctx, + struct io_buf_table *table) +{ + io_rsrc_data_free(ctx, &table->data); + io_alloc_cache_free(&table->node_cache, kfree); + io_alloc_cache_free(&table->imu_cache, kfree); +} + int io_sqe_buffers_unregister(struct io_ring_ctx *ctx) { if (!ctx->buf_table.data.nr) return -ENXIO; - io_rsrc_data_free(ctx, &ctx->buf_table.data); + io_rsrc_buffer_free(ctx, &ctx->buf_table); return 0; } @@ -722,6 +776,15 @@ bool io_check_coalesce_buffer(struct page **page_array, int nr_pages, return true; } +static struct io_mapped_ubuf *io_alloc_imu(struct io_ring_ctx *ctx, + int nr_bvecs) +{ + if (nr_bvecs <= IO_CACHED_BVECS_SEGS) + return io_cache_alloc(&ctx->buf_table.imu_cache, GFP_KERNEL); + return kvmalloc(struct_size_t(struct io_mapped_ubuf, bvec, nr_bvecs), + GFP_KERNEL); +} + static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, struct iovec *iov, struct page **last_hpage) @@ -738,7 +801,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, if (!iov->iov_base) return NULL; - node = io_rsrc_node_alloc(IORING_RSRC_BUFFER); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_BUFFER); if (!node) return ERR_PTR(-ENOMEM); node->buf = NULL; @@ -758,7 +821,7 @@ static struct io_rsrc_node *io_sqe_buffer_register(struct io_ring_ctx *ctx, coalesced = io_coalesce_buffer(&pages, &nr_pages, &data); } - imu = kvmalloc(struct_size(imu, bvec, nr_pages), GFP_KERNEL); + imu = io_alloc_imu(ctx, nr_pages); if (!imu) goto done; @@ -804,9 +867,9 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, unsigned int nr_args, u64 __user *tags) { struct page *last_hpage = NULL; - struct io_rsrc_data data; struct iovec fast_iov, *iov = &fast_iov; const struct iovec __user *uvec; + struct io_buf_table table; int i, ret; BUILD_BUG_ON(IORING_MAX_REG_BUFFERS >= (1u << 16)); @@ -815,13 +878,14 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, return -EBUSY; if (!nr_args || nr_args > IORING_MAX_REG_BUFFERS) return -EINVAL; - ret = io_rsrc_data_alloc(&data, nr_args); + ret = io_rsrc_buffer_alloc(&table, nr_args); if (ret) return ret; if (!arg) memset(iov, 0, sizeof(*iov)); + ctx->buf_table = table; for (i = 0; i < nr_args; i++) { struct io_rsrc_node *node; u64 tag = 0; @@ -861,10 +925,8 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg, } node->tag = tag; } - data.nodes[i] = node; + table.data.nodes[i] = node; } - - ctx->buf_table.data = data; if (ret) io_sqe_buffers_unregister(ctx); return ret; @@ -892,7 +954,7 @@ int io_buffer_register_bvec(struct io_ring_ctx *ctx, struct request *rq, if (node) return -EBUSY; - node = io_rsrc_node_alloc(IORING_RSRC_KBUFFER); + node = io_rsrc_node_alloc(ctx, IORING_RSRC_KBUFFER); if (!node) return -ENOMEM; @@ -900,7 +962,8 @@ int io_buffer_register_bvec(struct io_ring_ctx *ctx, struct request *rq, node->priv = rq; nr_bvecs = blk_rq_nr_phys_segments(rq); - imu = kvmalloc(struct_size(imu, bvec, nr_bvecs), GFP_KERNEL); + + imu = io_alloc_imu(ctx, nr_bvecs); if (!imu) { kfree(node); return -ENOMEM; @@ -1022,7 +1085,7 @@ static void lock_two_rings(struct io_ring_ctx *ctx1, struct io_ring_ctx *ctx2) static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx, struct io_uring_clone_buffers *arg) { - struct io_rsrc_data data; + struct io_buf_table table; int i, ret, off, nr; unsigned int nbufs; @@ -1053,7 +1116,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (check_add_overflow(arg->nr, arg->dst_off, &nbufs)) return -EOVERFLOW; - ret = io_rsrc_data_alloc(&data, max(nbufs, ctx->buf_table.data.nr)); + ret = io_rsrc_buffer_alloc(&table, max(nbufs, ctx->buf_table.data.nr)); if (ret) return ret; @@ -1062,7 +1125,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx struct io_rsrc_node *src_node = ctx->buf_table.data.nodes[i]; if (src_node) { - data.nodes[i] = src_node; + table.data.nodes[i] = src_node; src_node->refs++; } } @@ -1092,7 +1155,7 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx if (!src_node) { dst_node = NULL; } else { - dst_node = io_rsrc_node_alloc(src_node->type); + dst_node = io_rsrc_node_alloc(ctx, src_node->type); if (!dst_node) { ret = -ENOMEM; goto out_free; @@ -1101,12 +1164,12 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx refcount_inc(&src_node->buf->refs); dst_node->buf = src_node->buf; } - data.nodes[off++] = dst_node; + table.data.nodes[off++] = dst_node; i++; } /* - * If asked for replace, put the old table. data->nodes[] holds both + * If asked for replace, put the old table. table.data->nodes[] holds both * old and new nodes at this point. */ if (arg->flags & IORING_REGISTER_DST_REPLACE) @@ -1119,10 +1182,10 @@ static int io_clone_buffers(struct io_ring_ctx *ctx, struct io_ring_ctx *src_ctx * entry). */ WARN_ON_ONCE(ctx->buf_table.data.nr); - ctx->buf_table.data = data; + ctx->buf_table = table; return 0; out_free: - io_rsrc_data_free(ctx, &data); + io_rsrc_buffer_free(ctx, &table); return ret; } diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h index 8147dfc26f737..751db2ce9affb 100644 --- a/io_uring/rsrc.h +++ b/io_uring/rsrc.h @@ -49,7 +49,7 @@ struct io_imu_folio_data { unsigned int nr_folios; }; -struct io_rsrc_node *io_rsrc_node_alloc(int type); +struct io_rsrc_node *io_rsrc_node_alloc(struct io_ring_ctx *ctx, int type); void io_free_rsrc_node(struct io_ring_ctx *ctx, struct io_rsrc_node *node); void io_rsrc_data_free(struct io_ring_ctx *ctx, struct io_rsrc_data *data); int io_rsrc_data_alloc(struct io_rsrc_data *data, unsigned nr);