From patchwork Fri Oct 25 14:02:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13850755 Received: from mail-il1-f179.google.com (mail-il1-f179.google.com [209.85.166.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E42E1B394C for ; Fri, 25 Oct 2024 14:05:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865120; cv=none; b=mO2xW0/cNcZpbz3vVVLK2PJz5SJ8FRp2E+p2gCbdhscjiLO6LCknZGq5QVWGHTnwwfeqrH6t7E/bppoHBeLBlyW9riQP9Fq+6jshUrCl93LwpEYTEq43gFW9uS9wVZB2cT076aRTDMGplnf/OqE+HwEz9VbvnRFP0b6MONgjVi0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865120; c=relaxed/simple; bh=eAiFpPAvsaYS5b4WbjNYVCeDTzqN4Sf1Y8gYz+2RCoM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t6BVYwksuTKekam97Qfl8h4szxBTQCdou3iSDvgEf7aqO7vfVLXwlXAlU3aXh2N4+1oF8d6+yeRkGGCIUa/eJmLyvanB57rqiZCFYWmMBrTvFFG/gw4DlDrrf/MeXh4j3a2Xz4c3vZDQJPoC47u68+pHfX2EbtFwt/BPcvl2Kz8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=MWa5V2PH; arc=none smtp.client-ip=209.85.166.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="MWa5V2PH" Received: by mail-il1-f179.google.com with SMTP id e9e14a558f8ab-3a3f8543f5eso10256025ab.0 for ; Fri, 25 Oct 2024 07:05:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1729865116; x=1730469916; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=QifmuGZt0bFh+Z2AH/dkkDK+RN42GNXgBlCpLjpKI7Y=; b=MWa5V2PHqTruiwnrt9cYScKH65bHFmMint8HuPQT+bZ8j+uRHXqIN6XIVRhV50ddvR S/TVBcqMetN+Xl/lOUVF+9ouV6DrpbkH9FGWoHNdzDb/7KhqE5rmw25sqyiArGEnO4E7 lkCsN7Hz0ITmBxqcPsv2O96iy2dw3WxhFlo0yJhZYfqr5fdpsZjPGdKXIF4JdD53vKF7 rDoW7HwWx7TqyN43Byu1KcXDGpjxPZXkLruNlYDe8lUr1/o7xjzOHvWMHWa6KT58ICIZ DE4PxzcpUMKLQsiXGGTbmO7V9v0HcfJoVvjBftonV7dLcOTxg9PY9AAhIZMmPOxS5zQF i2/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729865116; x=1730469916; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QifmuGZt0bFh+Z2AH/dkkDK+RN42GNXgBlCpLjpKI7Y=; b=h0CqiMnSOtsCg2/b0S4/CObB8Nu3W8EwqtMM6Kha07jhZ4lr4PXyQXHAOOg+2yY2To aELCz0I0S3mOxyLrdLqJrxbcHBitEhLDZw1z3WK8qE5oOlRstnKzLrTBS0RJwJNePu2B f2GGEVISXiPj+PW/VhLG5L9VMIisLZT2Ck421DQOlWguoZC4Zpap/qQXv1YZnugRiTbc kARYaJTQfsC5i9Q/bLLA1qqDJ3VSluR1rWg47TGhHBYRu8iO4T6vtHcoQuX3so1pCGU2 71OFnJ81r7haXdqfdsvPnW+KDiskWI5QSKyTjJA909v03jDN27CVFQLIuFU+MdNw1Xhj H3cw== X-Gm-Message-State: AOJu0Yz2rw9jNhpaT8QZYRENNvw3Uq76TwjcSn8Js4nsHm3/JBNXdOqS C1/uzJxem0kCmk/yya/LrgjdVDTVbI1hlI19+DbKSvJ5jcJHtHGQYqu+jrmOmPuB4D+HFzKqXYe E X-Google-Smtp-Source: AGHT+IF0uJbHhwil04AuolESzdF5+aKyXKr3xrVf94Zf7JrzBPv3vMwBrK6MhGWOIWFipHDAWsC+gg== X-Received: by 2002:a05:6e02:1a26:b0:3a0:b0dc:1910 with SMTP id e9e14a558f8ab-3a4de74ccd6mr42784295ab.13.1729865116102; Fri, 25 Oct 2024 07:05:16 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-3a4e6e56641sm2924635ab.65.2024.10.25.07.05.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 07:05:14 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: jannh@google.com, Jens Axboe Subject: [PATCH 1/4] io_uring: move max entry definition and ring sizing into header Date: Fri, 25 Oct 2024 08:02:28 -0600 Message-ID: <20241025140502.167623-3-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241025140502.167623-2-axboe@kernel.dk> References: <20241025140502.167623-2-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for needing this somewhere else, move the definitions for the maximum CQ and SQ ring size into io_uring.h. Make the rings_size() helper available as well, and have it take just the setup flags argument rather than the fill ring pointer. That's all that is needed. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 14 ++++++-------- io_uring/io_uring.h | 5 +++++ 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 58b401900b41..6dea5242d666 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -105,9 +105,6 @@ #include "alloc_cache.h" #include "eventfd.h" -#define IORING_MAX_ENTRIES 32768 -#define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES) - #define SQE_COMMON_FLAGS (IOSQE_FIXED_FILE | IOSQE_IO_LINK | \ IOSQE_IO_HARDLINK | IOSQE_ASYNC) @@ -2667,8 +2664,8 @@ static void io_rings_free(struct io_ring_ctx *ctx) ctx->sq_sqes = NULL; } -static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, - unsigned int cq_entries, size_t *sq_offset) +unsigned long rings_size(unsigned int flags, unsigned int sq_entries, + unsigned int cq_entries, size_t *sq_offset) { struct io_rings *rings; size_t off, sq_array_size; @@ -2676,7 +2673,7 @@ static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries off = struct_size(rings, cqes, cq_entries); if (off == SIZE_MAX) return SIZE_MAX; - if (ctx->flags & IORING_SETUP_CQE32) { + if (flags & IORING_SETUP_CQE32) { if (check_shl_overflow(off, 1, &off)) return SIZE_MAX; } @@ -2687,7 +2684,7 @@ static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries return SIZE_MAX; #endif - if (ctx->flags & IORING_SETUP_NO_SQARRAY) { + if (flags & IORING_SETUP_NO_SQARRAY) { *sq_offset = SIZE_MAX; return off; } @@ -3434,7 +3431,8 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, ctx->sq_entries = p->sq_entries; ctx->cq_entries = p->cq_entries; - size = rings_size(ctx, p->sq_entries, p->cq_entries, &sq_array_offset); + size = rings_size(ctx->flags, p->sq_entries, p->cq_entries, + &sq_array_offset); if (size == SIZE_MAX) return -EOVERFLOW; diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 9cd9a127e9ed..4a471a810f02 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -65,6 +65,11 @@ static inline bool io_should_wake(struct io_wait_queue *iowq) return dist >= 0 || atomic_read(&ctx->cq_timeouts) != iowq->nr_timeouts; } +#define IORING_MAX_ENTRIES 32768 +#define IORING_MAX_CQ_ENTRIES (2 * IORING_MAX_ENTRIES) + +unsigned long rings_size(unsigned int flags, unsigned int sq_entries, + unsigned int cq_entries, size_t *sq_offset); bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow); int io_run_task_work_sig(struct io_ring_ctx *ctx); void io_req_defer_failed(struct io_kiocb *req, s32 res); From patchwork Fri Oct 25 14:02:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13850756 Received: from mail-io1-f42.google.com (mail-io1-f42.google.com [209.85.166.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 905EE1FB880 for ; Fri, 25 Oct 2024 14:05:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865122; cv=none; b=S2Rxn6mGUFLSCHhW7eMwFMvIvzELXG3lPmYjA4UtspuZk/G2B+2Ni3/wXTzXM11c10lbQ0c/AJX7sfXqkBQdM6upfa0DYfjoTeLVmmJ0c6ELUVGO/M1+w0BBMSzIaWDLsf8n3y5E6b8J5kNJolJWfbhr6m3gcaHYo5Q+7Y90kwQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865122; c=relaxed/simple; bh=j3VmYIUYlyBTGlqvUIMc0JUgVTjmWoaF5EMC9Dn7Zk0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XxgOov9xh5TQqwqaptH2E6/C50GicdM1IG7mSSSwx94k01RQz+iWk98otxGxLWqzAUGYQYZeNBjkSBq6pg2VViXYbvDvJpiopK9cCuv0FD+Ixz4/x7la03vgVx6GbW0mfSD3xq5UF1cCvOT3CjLgadGAW3oiVYHwnxCTbkMOGuU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=mzeHa1mB; arc=none smtp.client-ip=209.85.166.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="mzeHa1mB" Received: by mail-io1-f42.google.com with SMTP id ca18e2360f4ac-83ab5b4b048so84650939f.2 for ; Fri, 25 Oct 2024 07:05:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1729865118; x=1730469918; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=5+0vcL80oErDAtwoUFPo8e5XBLS2uO6xNRw7O3/obNg=; b=mzeHa1mBDyH0K8dMHWbvgphE5zEvZhuurtPcxo25caUcBeZkd+d29OMZ7bHCmjPwMe 7e/440f3mk6+E+5InDA6Zq9bqmt9o2uN1LSsRPZ0WS74duQy4Gefu1rnhFa6Tj2DQcd5 PssLkSZPZlzp/9ecdMFoQCzivE0mLex2b4SlHfTPDArTssIjOLk0K5XjzcQVmst3REEc zmTx3ZfsgZ86TLzQmk9IBeZk6oCYdtgQ1RpwR+64oSkRHJBPdctjb+KZw2dZxFzfIsyL MCbyWnEvs6+nfw7l2qL5hcBAa34Lp2kVPTTx57Vdq/XlRDooPCN6kgx2rtpCEjUkbG+X SH/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729865118; x=1730469918; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=5+0vcL80oErDAtwoUFPo8e5XBLS2uO6xNRw7O3/obNg=; b=iTshyvQLG7BCKc/osYnICrDGzvczL+rK8/Ef5M63rBU1OdFaU4Wbt+BhENRKu4pGJu 3oFA4H0R/5IKsqTF9h2Y0jsALQy6D+rUAecnAU1MShOqIWVYVp/0z0WLv+qQ3GnzM5Zr X1gHzPrlUvYRB/p9kSwubdbGMIoHXswWterv2dALZ/x5A6mLQF7JuktmBE+NX0dcycNg 7qe13HDC/cEv2zgheLmjRyhqDDpoEkaYDMG0OxRPOex5D8Sv6HO2kVp0Drg2PjI8KYSX p7LvpxsSCIEXMahvoo2wcO80nA75N6It+YrjTh8554/ADO6x+UkwYnpbEjoWHiWOORHK 4yww== X-Gm-Message-State: AOJu0YyOzrbISKFD+Fd391qYNpYIjQ2JOnIj84R6sLhpWFwt6EhF7n7d woV1d4LRb2P3sj5DwqZ5R7vt7jT7wMSbSkHz/aGbXHvIga56fWtnEKcHeSDnDFW1yIQN75H/AZS 4 X-Google-Smtp-Source: AGHT+IE3yo7YcZ6vLqWajl28vIH83Z4pmh+4jZ+w/3RxHU73dqNsMByGtc3eyPU7HbNv/Ib41v+06A== X-Received: by 2002:a05:6e02:1522:b0:39b:330b:bb25 with SMTP id e9e14a558f8ab-3a4de7a2df5mr63978055ab.12.1729865118074; Fri, 25 Oct 2024 07:05:18 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-3a4e6e56641sm2924635ab.65.2024.10.25.07.05.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 07:05:16 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: jannh@google.com, Jens Axboe Subject: [PATCH 2/4] io_uring: abstract out a bit of the ring filling logic Date: Fri, 25 Oct 2024 08:02:29 -0600 Message-ID: <20241025140502.167623-4-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241025140502.167623-2-axboe@kernel.dk> References: <20241025140502.167623-2-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Abstract out a io_uring_fill_params() helper, which fills out the necessary bits of struct io_uring_params. Add it to io_uring.h as well, in preparation for having another internal user of it. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 70 ++++++++++++++++++++++++++------------------- io_uring/io_uring.h | 1 + 2 files changed, 41 insertions(+), 30 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 6dea5242d666..b5974bdad48b 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3498,14 +3498,8 @@ static struct file *io_uring_get_file(struct io_ring_ctx *ctx) O_RDWR | O_CLOEXEC, NULL); } -static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, - struct io_uring_params __user *params) +int io_uring_fill_params(unsigned entries, struct io_uring_params *p) { - struct io_ring_ctx *ctx; - struct io_uring_task *tctx; - struct file *file; - int ret; - if (!entries) return -EINVAL; if (entries > IORING_MAX_ENTRIES) { @@ -3547,6 +3541,42 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->cq_entries = 2 * p->sq_entries; } + p->sq_off.head = offsetof(struct io_rings, sq.head); + p->sq_off.tail = offsetof(struct io_rings, sq.tail); + p->sq_off.ring_mask = offsetof(struct io_rings, sq_ring_mask); + p->sq_off.ring_entries = offsetof(struct io_rings, sq_ring_entries); + p->sq_off.flags = offsetof(struct io_rings, sq_flags); + p->sq_off.dropped = offsetof(struct io_rings, sq_dropped); + p->sq_off.resv1 = 0; + if (!(p->flags & IORING_SETUP_NO_MMAP)) + p->sq_off.user_addr = 0; + + p->cq_off.head = offsetof(struct io_rings, cq.head); + p->cq_off.tail = offsetof(struct io_rings, cq.tail); + p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask); + p->cq_off.ring_entries = offsetof(struct io_rings, cq_ring_entries); + p->cq_off.overflow = offsetof(struct io_rings, cq_overflow); + p->cq_off.cqes = offsetof(struct io_rings, cqes); + p->cq_off.flags = offsetof(struct io_rings, cq_flags); + p->cq_off.resv1 = 0; + if (!(p->flags & IORING_SETUP_NO_MMAP)) + p->cq_off.user_addr = 0; + + return 0; +} + +static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, + struct io_uring_params __user *params) +{ + struct io_ring_ctx *ctx; + struct io_uring_task *tctx; + struct file *file; + int ret; + + ret = io_uring_fill_params(entries, p); + if (unlikely(ret)) + return ret; + ctx = io_ring_ctx_alloc(p); if (!ctx) return -ENOMEM; @@ -3630,6 +3660,9 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, if (ret) goto err; + if (!(p->flags & IORING_SETUP_NO_SQARRAY)) + p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings; + ret = io_sq_offload_create(ctx, p); if (ret) goto err; @@ -3638,29 +3671,6 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, if (ret) goto err; - p->sq_off.head = offsetof(struct io_rings, sq.head); - p->sq_off.tail = offsetof(struct io_rings, sq.tail); - p->sq_off.ring_mask = offsetof(struct io_rings, sq_ring_mask); - p->sq_off.ring_entries = offsetof(struct io_rings, sq_ring_entries); - p->sq_off.flags = offsetof(struct io_rings, sq_flags); - p->sq_off.dropped = offsetof(struct io_rings, sq_dropped); - if (!(ctx->flags & IORING_SETUP_NO_SQARRAY)) - p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings; - p->sq_off.resv1 = 0; - if (!(ctx->flags & IORING_SETUP_NO_MMAP)) - p->sq_off.user_addr = 0; - - p->cq_off.head = offsetof(struct io_rings, cq.head); - p->cq_off.tail = offsetof(struct io_rings, cq.tail); - p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask); - p->cq_off.ring_entries = offsetof(struct io_rings, cq_ring_entries); - p->cq_off.overflow = offsetof(struct io_rings, cq_overflow); - p->cq_off.cqes = offsetof(struct io_rings, cqes); - p->cq_off.flags = offsetof(struct io_rings, cq_flags); - p->cq_off.resv1 = 0; - if (!(ctx->flags & IORING_SETUP_NO_MMAP)) - p->cq_off.user_addr = 0; - p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP | IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS | IORING_FEAT_CUR_PERSONALITY | IORING_FEAT_FAST_POLL | diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 4a471a810f02..e3e6cb14de5d 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -70,6 +70,7 @@ static inline bool io_should_wake(struct io_wait_queue *iowq) unsigned long rings_size(unsigned int flags, unsigned int sq_entries, unsigned int cq_entries, size_t *sq_offset); +int io_uring_fill_params(unsigned entries, struct io_uring_params *p); bool io_cqe_cache_refill(struct io_ring_ctx *ctx, bool overflow); int io_run_task_work_sig(struct io_ring_ctx *ctx); void io_req_defer_failed(struct io_kiocb *req, s32 res); From patchwork Fri Oct 25 14:02:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13850757 Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A694913B787 for ; Fri, 25 Oct 2024 14:05:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865123; cv=none; b=O8ENN1HrIPImKcZYzNMm8JfB+spsid3TLcRKFwj0XTAreCGilCf8ecHN3u8YfjyGBnSBu6Rsxez1cJmR2/Z2d1Ho+fvijakA9YOBCDgPKmj9xV5XOjBb3ajuKhNxGZScWBkkMmTCmNEjz+XYiDuS2Nj08JTZUx6CKviSDcCJPmw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865123; c=relaxed/simple; bh=Hc8LWFF+BnTAwlQsvqEqcz1icMxeGM4aq1XGEcW9+IY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ug62XbS3vrmKoJ9B+srKlVgZ3cuaZdlhEz8Txq9djSnuWIWFemxh3LU/bsNSdKSC74y2oFmAtUPyWawrPLP9MbgSotK7uVlic58GLb0btm7QsjHIbCcbl26wgat/MVeGB/ERft3BvLGV2JZyKaL9gOXGpMymM/2KSY5vuWBWKt4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=zA3rAh99; arc=none smtp.client-ip=209.85.166.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="zA3rAh99" Received: by mail-io1-f50.google.com with SMTP id ca18e2360f4ac-83abcfb9f37so83165939f.1 for ; Fri, 25 Oct 2024 07:05:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1729865120; x=1730469920; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=olc4hQqg9C/G+s6Y4gqlaKc2EU5xK/tf7mC8ufS+JaY=; b=zA3rAh99zW35dxOQsmYc5lHBhIg9Ko+9lAJuphhDKZfNk5NhWXTw8x7ObzEdLOX2sK FmIWyhj4nQWdZUmwlDElWUJ+W9z0bsV2hN7yOu2pesfsN7cdLAf8N69tNbcIKwGNgkIv 61XiUdcZWKZos1x9S3comtsBmUmpLfpbECc9Ix9N+DSZ5EQ7Ft8eSAFr2ybnSPS8+dmv +05l/EEnbjIZOdiu1N2Qk9x2NCQ8dbrqgDInYa0RO4pgmOZSZULdIbdG/26rnV7XpNu2 MIsssc1QOWzyS3W687avPSHF3ykwpFsbPEByi0p1tNY10qO1MhBxsXZDlH4zOBv5YcHA ij0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729865120; x=1730469920; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=olc4hQqg9C/G+s6Y4gqlaKc2EU5xK/tf7mC8ufS+JaY=; b=qqfGQRlonsbQD7+okU3dobh3Um6YRD/42VjdeELiZ5TOQVJPJfKjL+L3VW0vPeL7ib P0XIwhQeRjDmjXBTLqGu42KQzZ7IwovArMQfk7B3DuLjkhxVeQdwfwzKk86zUSEIvBQr q9rNNj/XUiYGbLTZlTRboarRGDXcT+wzG1mddAEy5m5dQgN06aHcDmU8sTgcuhjYcBzq TxjPxWzp6BiYtRtwiwJHCKpveZteB71myFSRynGy6MsmY5i2G1kHuc+FWOqWfsglh6Q6 acx1abqlmuq8hTrJ3BFp81f99AS2MrGZyZHvMdmn2Iioll+I8Ldz1MauAAw46RgtoIue 1llQ== X-Gm-Message-State: AOJu0YyW8iRnKj0SGLrpVMYbcRp1fJwgYPHJ9fGREhs0w/NK4kcsyb6o Z7l+7ujFrbt2qMw1K5dZi57QPlargjxgjivLszIh/VNmNGkzsw+EWCpAbr9G05j87ZCJu45rrvJ H X-Google-Smtp-Source: AGHT+IGtfhoXPPmdjyahESwm10Aoeuc6+0QDfGGwWQfTZYiGr0QktQSjDEqyElMr5Oryk26OqXSXjw== X-Received: by 2002:a05:6e02:58c:b0:3a4:e80c:ca3c with SMTP id e9e14a558f8ab-3a4e80ccd77mr13172045ab.5.1729865120208; Fri, 25 Oct 2024 07:05:20 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-3a4e6e56641sm2924635ab.65.2024.10.25.07.05.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 07:05:18 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: jannh@google.com, Jens Axboe Subject: [PATCH 3/4] io_uring/memmap: explicitly return -EFAULT for mmap on NULL rings Date: Fri, 25 Oct 2024 08:02:30 -0600 Message-ID: <20241025140502.167623-5-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241025140502.167623-2-axboe@kernel.dk> References: <20241025140502.167623-2-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 The later mapping will actually check this too, but in terms of code clarify, explicitly check for whether or not the rings and sqes are valid during validation. That makes it explicit that if they are non-NULL, they are valid and can get mapped. Signed-off-by: Jens Axboe --- io_uring/memmap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/io_uring/memmap.c b/io_uring/memmap.c index a0f32a255fd1..d614824e17bd 100644 --- a/io_uring/memmap.c +++ b/io_uring/memmap.c @@ -204,11 +204,15 @@ static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff, /* Don't allow mmap if the ring was setup without it */ if (ctx->flags & IORING_SETUP_NO_MMAP) return ERR_PTR(-EINVAL); + if (!ctx->rings) + return ERR_PTR(-EFAULT); return ctx->rings; case IORING_OFF_SQES: /* Don't allow mmap if the ring was setup without it */ if (ctx->flags & IORING_SETUP_NO_MMAP) return ERR_PTR(-EINVAL); + if (!ctx->sq_sqes) + return ERR_PTR(-EFAULT); return ctx->sq_sqes; case IORING_OFF_PBUF_RING: { struct io_buffer_list *bl; From patchwork Fri Oct 25 14:02:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13850758 Received: from mail-il1-f170.google.com (mail-il1-f170.google.com [209.85.166.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE44A13A879 for ; Fri, 25 Oct 2024 14:05:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865127; cv=none; b=ECI+SlQedoub2i/4pL4BogpRmAkM50kuOYHmopuRF/8LEveUh68aQfFwyFVMIx5pkVNzB1mcmrbzDkHDd7C5alxGK/mkSVHdnIozSBZGNLnbXV2rbqdp993embpLb7vMV+gsz92So+qUgUkD+p2poh29n2YZEWQzlpSeCL4y12g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865127; c=relaxed/simple; bh=891HBwKYlzea2RyDtcpfuoXgFIq50iRIU6B5ZJTWEwc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=frWF6jE+ZEeIZorh+D9XX7rBkmRLh908EY4LsZN31kPbVYI95zWFH4sK+ILx69GHQsGT9SGBHWJcehk3r3gf9WV2jp1qf1gkSGOQsn90VwnGCQCGlXNPEZKaPCq0kXsw4CEBXSqqe3cYzz2ZebBQRQ8D6y6rC/5YP3LivrHu6js= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=FlBV/2bl; arc=none smtp.client-ip=209.85.166.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="FlBV/2bl" Received: by mail-il1-f170.google.com with SMTP id e9e14a558f8ab-3a4e474983fso3961265ab.1 for ; Fri, 25 Oct 2024 07:05:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1729865122; x=1730469922; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=2IFhJrPJWY5NJkIvQAJNqeYBlsK7wOF+sVFG9Yzl+2c=; b=FlBV/2bl2bxlohLpY84clC+d7eU1ARvl1cJbeyIiCY6X4ffYs6O0PPufjG6KStf/WN VTWDliTxEnTY/5bhpRlPpWIXTNU17FmIET6vqlCdmGIb5dEwtcaWXaXYV3A+LL+oz7ae 0X/yKWPI5eg/f8A7FZEUfFB78r/IMfwpt7qr3IylKR9KtmKFFZsibZMOO43AZYGAp38Y zvXPi/YEh6pXWxXkrieHYw8g3vbR8mjvJW8pCd04MhVDcY0JKdx4IOmyuO1/AE08hAm9 JPVuzj3B8vJHFLc4DE5edVXd8jHw3JchIDsVMJO6u7PfT7s94JNrDDzerPYSG+cdJ+ya Ap0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729865122; x=1730469922; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2IFhJrPJWY5NJkIvQAJNqeYBlsK7wOF+sVFG9Yzl+2c=; b=IBUb/pH58iYwTo6CEhyxfvTJ/IaeT/YioHvgOBfSmmpXawG+yMBOTS2wdte6rmpX2b m0Ee8hWQOwZLHotAJrhWOSLqM0RXMFF94wl6N4zV1I5NxYaiGryds11mBb3sEhUCRBra PuA9SSUWawgA+eOCE3dDL6cMppakxFuL2VWE6WFcwvI0S4KRuRWBfaeem2ENqKUo+7Sc 32x/9/MWTD+clvabC0siXIh9dB2+SDAR89N0nwFevKxYdLLhsVypnfZ01aQNwhiG6PSW vMRa1f+834NNjxHOCqxve288qF6MXvuWnuX9cztyw9SiklHWTOpUFVxVoQ6mt3jNl6nB vNyQ== X-Gm-Message-State: AOJu0YxuJ01xeXnY3QgHOaxSH448d6bQgGAut4mOh/l98pGMjNArrIGQ JRYP1KrjBqPZcBFJyoBF70ik4Bjh0GbC2YE9JXpq3BmmPOTEgTGHcQ2TTMrJcnimCkK7+Kvt8a0 N X-Google-Smtp-Source: AGHT+IFftJsbe7Z2GSUTM+MZ6690stU+l0qcKkhi26rV7bhR0KtEmma8Kr7Y8upN2Qd54IKMP+c5PQ== X-Received: by 2002:a05:6e02:2142:b0:3a0:90c7:f1b with SMTP id e9e14a558f8ab-3a4d595f1fcmr118840945ab.12.1729865122312; Fri, 25 Oct 2024 07:05:22 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id e9e14a558f8ab-3a4e6e56641sm2924635ab.65.2024.10.25.07.05.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 07:05:20 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: jannh@google.com, Jens Axboe Subject: [PATCH 4/4] io_uring/register: add IORING_REGISTER_RESIZE_RINGS Date: Fri, 25 Oct 2024 08:02:31 -0600 Message-ID: <20241025140502.167623-6-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241025140502.167623-2-axboe@kernel.dk> References: <20241025140502.167623-2-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Once a ring has been created, the size of the CQ and SQ rings are fixed. Usually this isn't a problem on the SQ ring side, as it merely controls the available number of requests that can be submitted in a single system call, and there's rarely a need to change that. For the CQ ring, it's a different story. For most efficient use of io_uring, it's important that the CQ ring never overflows. This means that applications must size it for the worst case scenario, which can be wasteful. Add IORING_REGISTER_RESIZE_RINGS, which allows an application to resize the existing rings. It takes a struct io_uring_params argument, the same one which is used to setup the ring initially, and resizes rings according to the sizes given. Certain properties are always inherited from the original ring setup, like SQE128/CQE32 and other setup options. The implementation only allows flag associated with how the CQ ring is sized and clamped. Existing unconsumed SQE and CQE entries are copied as part of the process. If either the SQ or CQ resized destination ring cannot hold the entries already present in the source rings, then the operation is failed with -EOVERFLOW. Any register op holds ->uring_lock, which prevents new submissions, and the internal mapping holds the completion lock as well across moving CQ ring state. To prevent races between mmap and ring resizing, add a mutex that's solely used to serialize ring resize and mmap. mmap_sem can't be used here, as as fork'ed process may be doing mmaps on the ring as well. The ctx->resize_lock is held across mmap operations, and the resize will grab it before swapping out the already mapped new data. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 7 ++ include/uapi/linux/io_uring.h | 3 + io_uring/io_uring.c | 1 + io_uring/memmap.c | 8 ++ io_uring/register.c | 215 +++++++++++++++++++++++++++++++++ 5 files changed, 234 insertions(+) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 6d3ee71bd832..841579dcdae9 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -415,6 +415,13 @@ struct io_ring_ctx { /* protected by ->completion_lock */ unsigned evfd_last_cq_tail; + /* + * Protection for resize vs mmap races - both the mmap and resize + * side will need to grab this lock, to prevent either side from + * being run concurrently with the other. + */ + struct mutex resize_lock; + /* * If IORING_SETUP_NO_MMAP is used, then the below holds * the gup'ed pages for the two rings, and the sqes. diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 86cb385fe0b5..c4737892c7cd 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -615,6 +615,9 @@ enum io_uring_register_op { /* send MSG_RING without having a ring */ IORING_REGISTER_SEND_MSG_RING = 31, + /* resize CQ ring */ + IORING_REGISTER_RESIZE_RINGS = 33, + /* this goes last */ IORING_REGISTER_LAST, diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index b5974bdad48b..140cd47fbdb3 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -353,6 +353,7 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_WQ_LIST(&ctx->submit_state.compl_reqs); INIT_HLIST_HEAD(&ctx->cancelable_uring_cmd); io_napi_init(ctx); + mutex_init(&ctx->resize_lock); return ctx; diff --git a/io_uring/memmap.c b/io_uring/memmap.c index d614824e17bd..85c66fa54956 100644 --- a/io_uring/memmap.c +++ b/io_uring/memmap.c @@ -251,6 +251,8 @@ __cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma) unsigned int npages; void *ptr; + guard(mutex)(&ctx->resize_lock); + ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz); if (IS_ERR(ptr)) return PTR_ERR(ptr); @@ -274,6 +276,7 @@ unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { + struct io_ring_ctx *ctx = filp->private_data; void *ptr; /* @@ -284,6 +287,8 @@ unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr, if (addr) return -EINVAL; + guard(mutex)(&ctx->resize_lock); + ptr = io_uring_validate_mmap_request(filp, pgoff, len); if (IS_ERR(ptr)) return -ENOMEM; @@ -329,8 +334,11 @@ unsigned long io_uring_get_unmapped_area(struct file *file, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags) { + struct io_ring_ctx *ctx = file->private_data; void *ptr; + guard(mutex)(&ctx->resize_lock); + ptr = io_uring_validate_mmap_request(file, pgoff, len); if (IS_ERR(ptr)) return PTR_ERR(ptr); diff --git a/io_uring/register.c b/io_uring/register.c index 52b2f9b74af8..fc6c94d694b2 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -29,6 +29,7 @@ #include "napi.h" #include "eventfd.h" #include "msg_ring.h" +#include "memmap.h" #define IORING_MAX_RESTRICTIONS (IORING_RESTRICTION_LAST + \ IORING_REGISTER_LAST + IORING_OP_LAST) @@ -361,6 +362,214 @@ static int io_register_clock(struct io_ring_ctx *ctx, return 0; } +/* + * State to maintain until we can swap. Both new and old state, used for + * either mapping or freeing. + */ +struct io_ring_ctx_rings { + unsigned short n_ring_pages; + unsigned short n_sqe_pages; + struct page **ring_pages; + struct page **sqe_pages; + struct io_uring_sqe *sq_sqes; + struct io_rings *rings; +}; + +static void io_register_free_rings(struct io_uring_params *p, + struct io_ring_ctx_rings *r) +{ + if (!(p->flags & IORING_SETUP_NO_MMAP)) { + io_pages_unmap(r->rings, &r->ring_pages, &r->n_ring_pages, + true); + io_pages_unmap(r->sq_sqes, &r->sqe_pages, &r->n_sqe_pages, + true); + } else { + io_pages_free(&r->ring_pages, r->n_ring_pages); + io_pages_free(&r->sqe_pages, r->n_sqe_pages); + vunmap(r->rings); + vunmap(r->sq_sqes); + } +} + +#define swap_old(ctx, o, n, field) \ + do { \ + (o).field = (ctx)->field; \ + (ctx)->field = (n).field; \ + } while (0) + +#define RESIZE_FLAGS (IORING_SETUP_CQSIZE | IORING_SETUP_CLAMP) +#define COPY_FLAGS (IORING_SETUP_NO_SQARRAY | IORING_SETUP_SQE128 | \ + IORING_SETUP_CQE32 | IORING_SETUP_NO_MMAP) + +static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg) +{ + struct io_ring_ctx_rings o = { }, n = { }, *to_free = NULL; + size_t size, sq_array_offset; + struct io_uring_params p; + unsigned i, tail; + void *ptr; + int ret; + + /* for single issuer, must be owner resizing */ + if (ctx->flags & IORING_SETUP_SINGLE_ISSUER && + current != ctx->submitter_task) + return -EEXIST; + if (copy_from_user(&p, arg, sizeof(p))) + return -EFAULT; + if (p.flags & ~RESIZE_FLAGS) + return -EINVAL; + + /* properties that are always inherited */ + p.flags |= (ctx->flags & COPY_FLAGS); + + ret = io_uring_fill_params(p.sq_entries, &p); + if (unlikely(ret)) + return ret; + + /* nothing to do, but copy params back */ + if (p.sq_entries == ctx->sq_entries && p.cq_entries == ctx->cq_entries) { + if (copy_to_user(arg, &p, sizeof(p))) + return -EFAULT; + return 0; + } + + size = rings_size(p.flags, p.sq_entries, p.cq_entries, + &sq_array_offset); + if (size == SIZE_MAX) + return -EOVERFLOW; + + if (!(p.flags & IORING_SETUP_NO_MMAP)) + n.rings = io_pages_map(&n.ring_pages, &n.n_ring_pages, size); + else + n.rings = __io_uaddr_map(&n.ring_pages, &n.n_ring_pages, + p.cq_off.user_addr, size); + if (IS_ERR(n.rings)) + return PTR_ERR(n.rings); + + n.rings->sq_ring_mask = p.sq_entries - 1; + n.rings->cq_ring_mask = p.cq_entries - 1; + n.rings->sq_ring_entries = p.sq_entries; + n.rings->cq_ring_entries = p.cq_entries; + + if (copy_to_user(arg, &p, sizeof(p))) { + io_register_free_rings(&p, &n); + return -EFAULT; + } + + if (p.flags & IORING_SETUP_SQE128) + size = array_size(2 * sizeof(struct io_uring_sqe), p.sq_entries); + else + size = array_size(sizeof(struct io_uring_sqe), p.sq_entries); + if (size == SIZE_MAX) { + io_register_free_rings(&p, &n); + return -EOVERFLOW; + } + + if (!(p.flags & IORING_SETUP_NO_MMAP)) + ptr = io_pages_map(&n.sqe_pages, &n.n_sqe_pages, size); + else + ptr = __io_uaddr_map(&n.sqe_pages, &n.n_sqe_pages, + p.sq_off.user_addr, + size); + if (IS_ERR(ptr)) { + io_register_free_rings(&p, &n); + return PTR_ERR(ptr); + } + + /* + * If using SQPOLL, park the thread + */ + if (ctx->sq_data) { + mutex_unlock(&ctx->uring_lock); + io_sq_thread_park(ctx->sq_data); + mutex_lock(&ctx->uring_lock); + } + + /* + * We'll do the swap. Grab the ctx->resize_lock, which will exclude + * any new mmap's on the ring fd. Clear out existing mappings to prevent + * mmap from seeing them, as we'll unmap them. Any attempt to mmap + * existing rings beyond this point will fail. Not that it could proceed + * at this point anyway, as the io_uring mmap side needs go grab the + * ctx->resize_lock as well. Likewise, hold the completion lock over the + * duration of the actual swap. + */ + mutex_lock(&ctx->resize_lock); + spin_lock(&ctx->completion_lock); + o.rings = ctx->rings; + ctx->rings = NULL; + o.sq_sqes = ctx->sq_sqes; + ctx->sq_sqes = NULL; + + /* + * Now copy SQ and CQ entries, if any. If either of the destination + * rings can't hold what is already there, then fail the operation. + */ + n.sq_sqes = ptr; + tail = o.rings->sq.tail; + if (tail - o.rings->sq.head > p.sq_entries) + goto overflow; + for (i = o.rings->sq.head; i < tail; i++) { + unsigned src_head = i & (ctx->sq_entries - 1); + unsigned dst_head = i & n.rings->sq_ring_mask; + + n.sq_sqes[dst_head] = o.sq_sqes[src_head]; + } + n.rings->sq.head = o.rings->sq.head; + n.rings->sq.tail = o.rings->sq.tail; + + tail = o.rings->cq.tail; + if (tail - o.rings->cq.head > p.cq_entries) { +overflow: + /* restore old rings, and return -EOVERFLOW via cleanup path */ + ctx->rings = o.rings; + ctx->sq_sqes = o.sq_sqes; + to_free = &n; + ret = -EOVERFLOW; + goto out; + } + for (i = o.rings->cq.head; i < tail; i++) { + unsigned src_head = i & (ctx->cq_entries - 1); + unsigned dst_head = i & n.rings->cq_ring_mask; + + n.rings->cqes[dst_head] = o.rings->cqes[src_head]; + } + n.rings->cq.head = o.rings->cq.head; + n.rings->cq.tail = o.rings->cq.tail; + /* invalidate cached cqe refill */ + ctx->cqe_cached = ctx->cqe_sentinel = NULL; + + n.rings->sq_dropped = o.rings->sq_dropped; + n.rings->sq_flags = o.rings->sq_flags; + n.rings->cq_flags = o.rings->cq_flags; + n.rings->cq_overflow = o.rings->cq_overflow; + + /* all done, store old pointers and assign new ones */ + if (!(ctx->flags & IORING_SETUP_NO_SQARRAY)) + ctx->sq_array = (u32 *)((char *)n.rings + sq_array_offset); + + ctx->sq_entries = p.sq_entries; + ctx->cq_entries = p.cq_entries; + + ctx->rings = n.rings; + ctx->sq_sqes = n.sq_sqes; + swap_old(ctx, o, n, n_ring_pages); + swap_old(ctx, o, n, n_sqe_pages); + swap_old(ctx, o, n, ring_pages); + swap_old(ctx, o, n, sqe_pages); + to_free = &o; + ret = 0; +out: + spin_unlock(&ctx->completion_lock); + mutex_unlock(&ctx->resize_lock); + io_register_free_rings(&p, to_free); + + if (ctx->sq_data) + io_sq_thread_unpark(ctx->sq_data); + + return ret; +} + static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, void __user *arg, unsigned nr_args) __releases(ctx->uring_lock) @@ -549,6 +758,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_clone_buffers(ctx, arg); break; + case IORING_REGISTER_RESIZE_RINGS: + ret = -EINVAL; + if (!arg || nr_args != 1) + break; + ret = io_register_resize_rings(ctx, arg); + break; default: ret = -EINVAL; break;