From patchwork Sat May 13 14:16:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13240245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 381D4C77B7F for ; Sat, 13 May 2023 14:16:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239090AbjEMOQx (ORCPT ); Sat, 13 May 2023 10:16:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43070 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239014AbjEMOQv (ORCPT ); Sat, 13 May 2023 10:16:51 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 57C9210DB for ; Sat, 13 May 2023 07:16:50 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-6439bf89cb7so2100184b3a.0 for ; Sat, 13 May 2023 07:16:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1683987409; x=1686579409; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KeikUptRo1Vz/99SDRwOhdtFlNgW5Ls2g9rBhwcNCxY=; b=tMPX8q+pyByMbA8QOrNTz6ngQcxolut01rB3UnmmFisYxMDxWMIuAIPklHt+Q0Kjxp nyj8PPOp1Hlrp1s+WTF7QDjHSmEnchne9X2lwAjtQNKZ19JASaZ/hb8xBRjIsRvDLzIs 3z9xqrqRkgDaKeI2LrCHZXmpDQSNgDucw2moE2GtXIrxr7FYkirMPsltZ4i603FlOSw3 GnkaYyBiLFAKqLBZCYWKPtXg8y1CY8K47gmBWLgZtSQyX7NCZv6VvTCCKtSTp4Q4FxJV HiadtbHRwRqAvMvRPkwP79Cr84sFC3/BcERkTzNtVYoUOrLL35zN7RGSRQUpWRDGlBxH AvBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683987409; x=1686579409; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KeikUptRo1Vz/99SDRwOhdtFlNgW5Ls2g9rBhwcNCxY=; b=BlNT1WrSLU2BL9GL23iCn5RHDhrVEnpiDqcRvRW1LJQVvsfMqya/MCz4ZYK7q59rhb 6bV/PR/zlNH6eQeRk2yDnyWK5n/bFNtrshwJlAB+bVAUyfaIeyYuqI/u8+2NRWeaehRh bmLGbDeJcxU+uJt5ucY902kD06ZmGu3TZMGIN4/VAc8ragMsbuL/jX8mcwSuTwiipyU7 wRDWiXxhSRukmlmzKjt1i3ToKkHORAFmPhoa0zWGkNKVL5fN/OP5fGqGuCPhRIVsYlpX 0UWCRILvlAeVLel4uEcwlZpwIN8lU+ZGF00EgmeAhVxG5qKDQJ5+SejgCILpFgmkjB9t Pa2Q== X-Gm-Message-State: AC+VfDzJgvHDldimv0YyJsvA9ROllCpdqvVMofyPZzLfT4PiP2IzhiJL 7sq+ByRFHwiRDKnb24E7+yQGGeQ01f88lYaVK9k= X-Google-Smtp-Source: ACHHUZ7pway0wH28HUe8S9WYzWeZHs2jIfzcllOxD9e9FRoNfGxOSZCVhUAy8m7PCHqrA+1CIheM8Q== X-Received: by 2002:a05:6a00:9a7:b0:643:62e4:75 with SMTP id u39-20020a056a0009a700b0064362e40075mr38629258pfg.1.1683987409394; Sat, 13 May 2023 07:16:49 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id o4-20020a63f144000000b00513973a7014sm8360027pgk.12.2023.05.13.07.16.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 May 2023 07:16:48 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 1/4] io_uring: remove sq/cq_off memset Date: Sat, 13 May 2023 08:16:40 -0600 Message-Id: <20230513141643.1037620-2-axboe@kernel.dk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230513141643.1037620-1-axboe@kernel.dk> References: <20230513141643.1037620-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org We only have two reserved members we're not clearing, do so manually instead. This is in preparation for using one of these members for a new feature. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3bca7a79efda..3695c5e6fbf0 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3887,7 +3887,6 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, if (ret) goto err; - memset(&p->sq_off, 0, sizeof(p->sq_off)); p->sq_off.head = offsetof(struct io_rings, sq.head); p->sq_off.tail = offsetof(struct io_rings, sq.tail); p->sq_off.ring_mask = offsetof(struct io_rings, sq_ring_mask); @@ -3895,8 +3894,9 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->sq_off.flags = offsetof(struct io_rings, sq_flags); p->sq_off.dropped = offsetof(struct io_rings, sq_dropped); p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings; + p->sq_off.resv1 = 0; + p->sq_off.resv2 = 0; - memset(&p->cq_off, 0, sizeof(p->cq_off)); p->cq_off.head = offsetof(struct io_rings, cq.head); p->cq_off.tail = offsetof(struct io_rings, cq.tail); p->cq_off.ring_mask = offsetof(struct io_rings, cq_ring_mask); @@ -3904,6 +3904,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->cq_off.overflow = offsetof(struct io_rings, cq_overflow); p->cq_off.cqes = offsetof(struct io_rings, cqes); p->cq_off.flags = offsetof(struct io_rings, cq_flags); + p->cq_off.resv1 = 0; + p->cq_off.resv2 = 0; p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP | IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS | From patchwork Sat May 13 14:16:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13240247 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 055ACC7EE23 for ; Sat, 13 May 2023 14:16:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239014AbjEMOQy (ORCPT ); Sat, 13 May 2023 10:16:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43128 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238993AbjEMOQw (ORCPT ); Sat, 13 May 2023 10:16:52 -0400 Received: from mail-pf1-x436.google.com (mail-pf1-x436.google.com [IPv6:2607:f8b0:4864:20::436]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B67731720 for ; Sat, 13 May 2023 07:16:51 -0700 (PDT) Received: by mail-pf1-x436.google.com with SMTP id d2e1a72fcca58-6439e53ed82so1823011b3a.1 for ; Sat, 13 May 2023 07:16:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1683987411; x=1686579411; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=rt2hcdeuCY3fd8kRdAhLT1/J1gq2onluDKK2fZ7hxuU=; b=w0bpJFdrHUABzxWzCUfBfYZ03Hw1vC9RDFz64m6GKh6f0dUV6iKfjT/92OO3eSxjno z5csGPZKaxW8S3uZXZkxquUnu+/1LY/go3iwrXehfwc/3o8q7v5HDHFxD4yxTWg8ubLx G44rSxQrFuhWDto9xVLDQsYmg+Z+yvlp2ThKom+I33yB0IowQA8bFPw62+SAHZeMcUYP ySkbQulo27HjnvEtcXeWR2fFUXUh5WMvPeLdpIxyT7oqtgEXUvifMjRkFpTyhk/pono3 TwBzSMzTlXXyU26xwxCOgtlPze4lXNgcU5UaRORcqwGMzI8/5PIArcu7UcUBa3gH4rKt 2uFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683987411; x=1686579411; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=rt2hcdeuCY3fd8kRdAhLT1/J1gq2onluDKK2fZ7hxuU=; b=W82uFJUxBbqotsu5iHRHPHgtr+aPXHOOjVGj+V3AUaA1V9YtL8u6yTSp+j+/omNeaR HxE7eLPfKc4kdWSPKngJaCLohf9rNqK1AAFsoQJYUcreEKgqZ2dJmJtdffaWbnlDPL5t +Kwm7BrYleGEODSL3JY9D907xZn/zhZikozcsx0d/PtwW43jM6HU7n5w1CXINYe4Qopz ElP2fInB5KscXZme+KgdUUzcOe3kONBRXhbV0VhzT2hlcgztHYdF3iSCaL5a7un9Fmsn /PjZmo6mM1Uf2PPE3wrmhNhMJbFcfSxxIVu6JbNgK64lptBh7VA9XSjhQWEy0tn6MbDB luHg== X-Gm-Message-State: AC+VfDyhn4SEw8Jt/28J0t/yiCtfOL6uao9mmvYYMH14zMNSoTy5UcwO ef98ZT51jQpaXZmrtTZiw5i3HEFf5SJBl3D4tdc= X-Google-Smtp-Source: ACHHUZ529zWNCJmHFzzHJwuxZFpYDSnAPPVVCVqeBskn8n7wH68Kr8jdrDQ99lSlFSEUbgYKw6k6Ow== X-Received: by 2002:a05:6a20:12ce:b0:105:66d3:854d with SMTP id v14-20020a056a2012ce00b0010566d3854dmr692198pzg.6.1683987410771; Sat, 13 May 2023 07:16:50 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id o4-20020a63f144000000b00513973a7014sm8360027pgk.12.2023.05.13.07.16.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 May 2023 07:16:49 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 2/4] io_uring: return error pointer from io_mem_alloc() Date: Sat, 13 May 2023 08:16:41 -0600 Message-Id: <20230513141643.1037620-3-axboe@kernel.dk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230513141643.1037620-1-axboe@kernel.dk> References: <20230513141643.1037620-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org In preparation for having more than one time of ring allocator, make the existing one return valid/error-pointer rather than just NULL. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 3695c5e6fbf0..6266a870c89f 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2712,8 +2712,12 @@ static void io_mem_free(void *ptr) static void *io_mem_alloc(size_t size) { gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; + void *ret; - return (void *) __get_free_pages(gfp, get_order(size)); + ret = (void *) __get_free_pages(gfp, get_order(size)); + if (ret) + return ret; + return ERR_PTR(-ENOMEM); } static unsigned long rings_size(struct io_ring_ctx *ctx, unsigned int sq_entries, @@ -3673,6 +3677,7 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, { struct io_rings *rings; size_t size, sq_array_offset; + void *ptr; /* make sure these are sane, as we already accounted them */ ctx->sq_entries = p->sq_entries; @@ -3683,8 +3688,8 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; rings = io_mem_alloc(size); - if (!rings) - return -ENOMEM; + if (IS_ERR(rings)) + return PTR_ERR(rings); ctx->rings = rings; ctx->sq_array = (u32 *)((char *)rings + sq_array_offset); @@ -3703,13 +3708,14 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; } - ctx->sq_sqes = io_mem_alloc(size); - if (!ctx->sq_sqes) { + ptr = io_mem_alloc(size); + if (IS_ERR(ptr)) { io_mem_free(ctx->rings); ctx->rings = NULL; - return -ENOMEM; + return PTR_ERR(ptr); } + ctx->sq_sqes = io_mem_alloc(size); return 0; } From patchwork Sat May 13 14:16:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13240246 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBDDDC7EE24 for ; Sat, 13 May 2023 14:16:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238993AbjEMOQz (ORCPT ); Sat, 13 May 2023 10:16:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239091AbjEMOQy (ORCPT ); Sat, 13 May 2023 10:16:54 -0400 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 233E81BD3 for ; Sat, 13 May 2023 07:16:53 -0700 (PDT) Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-1ab125a198dso17072215ad.1 for ; Sat, 13 May 2023 07:16:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1683987412; x=1686579412; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=i7i1Q+mmWPUWbGIul4vuYP4Z7cMnZnct/1f0uEfnrgU=; b=iHLLHtMkhpQX0AH0heUj05vnyWTNv4B4oN7gxOHndYAGErYiflrvtiSU37Cd0cCyQZ IrQT/w0HOIR0zPFBH6d5RcT60C9+4M4E3HkdduiJvnCqDHb+PBEjNrucw31K5zGqqU3u oeah4WEdRwouqVYdPvI3PtTgyszt4VBNhoYEOkM6svlfnbb31q3fnUc00LuLkiuEB6lV 2p5UczVNPjLgPTTzzdZFAmNp7aE2SANc5qpTe62Wm4l8UHdnoumSg3Sr2bmDHgu21Ppt TUHq/AKyo5zl+gffZGh4A471jAaXltPzJ1IgppAxSIIwzeWC/IkDsYPfSvVsmQTVRfu3 WX+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683987412; x=1686579412; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=i7i1Q+mmWPUWbGIul4vuYP4Z7cMnZnct/1f0uEfnrgU=; b=DyybA2V5uxaxv4Kl3gldG2QvgnTEHXnXvtescornLiPeAVcx6ZE2ee1wcy8Wqufl7B iIlprDDokiQ6gbedt8WDqRaEQuS56bYNWLYucvfdXQSvJ2C2MDrYHTCwBpM2sPCXqB3m Av4sXAS5V2w+Nv6b6Okgy9ZP2sEqubAxhyvtgcIUdeAJQoFil9E9y/y25iI1sv1m5337 t2SAb/DW2uGlWPOqTCsEN4R1h4zATHe0m37mBVSAB69TjR/Zam5OQ2qsArqN01KVruxd JJcaPN5S2Twg+RIgWFxX2W4rXTtXbqhHLD49Pzh8qLekT4QKdfdSyPEntsAVCby1i+1w acSw== X-Gm-Message-State: AC+VfDxsMFKowieiBc6692GfcYb2/vMQGxuH124K3dtdzF3RsfLu4ykn IOVYEJvAvhzShOY+tQGMGyINGLTQSogXuZugBts= X-Google-Smtp-Source: ACHHUZ75bMiEYYDaEnLKakYrjG4OYoaFbrCo2fI6iPv9V62N8JBU9GtZRK2TqDLgG2UvJ7FXcolSDA== X-Received: by 2002:a17:902:ecd0:b0:1ac:881b:494 with SMTP id a16-20020a170902ecd000b001ac881b0494mr23471093plh.0.1683987412093; Sat, 13 May 2023 07:16:52 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id o4-20020a63f144000000b00513973a7014sm8360027pgk.12.2023.05.13.07.16.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 May 2023 07:16:51 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/4] io_uring: add ring freeing helper Date: Sat, 13 May 2023 08:16:42 -0600 Message-Id: <20230513141643.1037620-4-axboe@kernel.dk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230513141643.1037620-1-axboe@kernel.dk> References: <20230513141643.1037620-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org We do rings and sqes separately, move them into a helper that does both the freeing and clearing of the memory. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 6266a870c89f..5433e8d6c481 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2709,6 +2709,14 @@ static void io_mem_free(void *ptr) free_compound_page(page); } +static void io_rings_free(struct io_ring_ctx *ctx) +{ + io_mem_free(ctx->rings); + io_mem_free(ctx->sq_sqes); + ctx->rings = NULL; + ctx->sq_sqes = NULL; +} + static void *io_mem_alloc(size_t size) { gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN | __GFP_COMP; @@ -2873,8 +2881,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) mmdrop(ctx->mm_account); ctx->mm_account = NULL; } - io_mem_free(ctx->rings); - io_mem_free(ctx->sq_sqes); + io_rings_free(ctx); percpu_ref_exit(&ctx->refs); free_uid(ctx->user); @@ -3703,15 +3710,13 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, else size = array_size(sizeof(struct io_uring_sqe), p->sq_entries); if (size == SIZE_MAX) { - io_mem_free(ctx->rings); - ctx->rings = NULL; + io_rings_free(ctx); return -EOVERFLOW; } ptr = io_mem_alloc(size); if (IS_ERR(ptr)) { - io_mem_free(ctx->rings); - ctx->rings = NULL; + io_rings_free(ctx); return PTR_ERR(ptr); } From patchwork Sat May 13 14:16:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13240248 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4063C77B7D for ; Sat, 13 May 2023 14:18:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237196AbjEMOSB (ORCPT ); Sat, 13 May 2023 10:18:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239116AbjEMOQ4 (ORCPT ); Sat, 13 May 2023 10:16:56 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 725121FE6 for ; Sat, 13 May 2023 07:16:54 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-6439bf89cb7so2100195b3a.0 for ; Sat, 13 May 2023 07:16:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20221208.gappssmtp.com; s=20221208; t=1683987413; x=1686579413; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=4nm4Xm3ajEi82BV1Xr3IeCHO1J9BtqnQCIxSBMCatIw=; b=YOye63Mcgyp4i2xozBQa9vt9GWSRnFqNkI42mcixjsC615kW2iB36hll3GiDqQ9KQC 2grYbnC/zObTbrJQwdXrM2RpLzdcwfOBe5SZxPLrizRO6XXyWEpLPS7muR46k9kCm/pq HIkCmzh3j8ejyFj8qsFSiRZlozdYi3oMgQvGDliy1+8s6v0ZdIwyMCSlxYLl8LroMFUG BTDS6umWWPXX1lckakhsvTfX0iLlk91SI8L9q39ggs1qQzefFbHlZK2P4ddNOxwGneO0 JRHySH5immoLiZWUJa3TT0jqpCgwyJtu3pQLkfFF81iHn57MfiZ6f2iWVumvsTnfVzR0 BR5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683987413; x=1686579413; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=4nm4Xm3ajEi82BV1Xr3IeCHO1J9BtqnQCIxSBMCatIw=; b=GVfLPmUGi5YZ3mBFMh0j035ny6Ysl+lp/r/YbJuVnIHo9twu1N/EEOvDYGnnU+9NFQ bNd4rXUxE4UFhgZhNDuin3WzL/Bj4wcH8sX9OQR8GZLAY/bRdIgmchAer+ogKtZ9bP3d jsJaSk9W1uPjPMtnYVn/ek4pFX/Qw964eaTpwH/2Fga15PQF30Dd2Pden5pnNTTcqT7c wSq2w3LN6grvceZj8uq3d8jzLhZpPbbny5Hjm5tc+eD76qhHOEQn8GhkgiJ+N/BTkb+s XkCkQs52qWlAIt2uVQCHFlj1954q65MzrokndMReHr+quwSYUBbBphChox/mSjh1+i+v PtBg== X-Gm-Message-State: AC+VfDwr0XKoaf+TpXC6sauLdrdJkPKhXcNUTJApNS5G+Ue6CWqPhcZs qBUfI8sWHcCqIi16aZg4QQeQT0rTN3ja61ZjukM= X-Google-Smtp-Source: ACHHUZ4yEbAmR7WgAwXzaancQ64M2zYY/fuobm+jq09roja8xJmLBVuhKvqY77dOd+9ZPXsZXJaD2A== X-Received: by 2002:a05:6a20:a108:b0:100:b92b:e8be with SMTP id q8-20020a056a20a10800b00100b92be8bemr23693801pzk.2.1683987413473; Sat, 13 May 2023 07:16:53 -0700 (PDT) Received: from localhost.localdomain ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id o4-20020a63f144000000b00513973a7014sm8360027pgk.12.2023.05.13.07.16.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 13 May 2023 07:16:52 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 4/4] io_uring: support for user allocated memory for rings/sqes Date: Sat, 13 May 2023 08:16:43 -0600 Message-Id: <20230513141643.1037620-5-axboe@kernel.dk> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230513141643.1037620-1-axboe@kernel.dk> References: <20230513141643.1037620-1-axboe@kernel.dk> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: io-uring@vger.kernel.org Currently io_uring applications must call mmap(2) twice to map the rings themselves, and the sqes array. This works fine, but it does not support using huge pages to back the rings/sqes. Provide a way for the application to pass in pre-allocated memory for the rings/sqes, which can then suitably be allocated from shmfs or via mmap to get huge page support. Particularly for larger rings, this reduces the TLBs needed. If an application wishes to take advantage of that, it must pre-allocate the memory needed for the sq/cq ring, and the sqes. The former must be passed in via the io_uring_params->cq_off.user_data field, while the latter is passed in via the io_uring_params->sq_off.user_data field. Then it must set IORING_SETUP_NO_MMAP in the io_uring_params->flags field, and io_uring will then map the existing memory into the kernel for shared use. The application must not call mmap(2) to map rings as it otherwise would have, that will now fail with -EINVAL if this setup flag was used. The pages used for the rings and sqes must be contigious. The intent here is clearly that huge pages should be used, otherwise the normal setup procedure works fine as-is. The application may use one huge page for both the rings and sqes. Outside of those initialization changes, everything works like it did before. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 10 +++ include/uapi/linux/io_uring.h | 9 ++- io_uring/io_uring.c | 108 ++++++++++++++++++++++++++++++--- 3 files changed, 115 insertions(+), 12 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 1b2a20a42413..f04ce513fadb 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -211,6 +211,16 @@ struct io_ring_ctx { unsigned int compat: 1; enum task_work_notify_mode notify_method; + + /* + * If IORING_SETUP_NO_MMAP is used, then the below holds + * the gup'ed pages for the two rings, and the sqes. + */ + unsigned short n_ring_pages; + unsigned short n_sqe_pages; + struct page **ring_pages; + struct page **sqe_pages; + struct io_rings *rings; struct task_struct *submitter_task; struct percpu_ref refs; diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index 0716cb17e436..2edba9a274de 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -173,6 +173,11 @@ enum { */ #define IORING_SETUP_DEFER_TASKRUN (1U << 13) +/* + * Application provides the memory for the rings + */ +#define IORING_SETUP_NO_MMAP (1U << 14) + enum io_uring_op { IORING_OP_NOP, IORING_OP_READV, @@ -406,7 +411,7 @@ struct io_sqring_offsets { __u32 dropped; __u32 array; __u32 resv1; - __u64 resv2; + __u64 user_addr; }; /* @@ -425,7 +430,7 @@ struct io_cqring_offsets { __u32 cqes; __u32 flags; __u32 resv1; - __u64 resv2; + __u64 user_addr; }; /* diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 5433e8d6c481..fccc80c201fb 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2709,12 +2709,85 @@ static void io_mem_free(void *ptr) free_compound_page(page); } +static void io_pages_free(struct page ***pages, int npages) +{ + struct page **page_array; + int i; + + if (!pages) + return; + page_array = *pages; + for (i = 0; i < npages; i++) + unpin_user_page(page_array[i]); + kvfree(page_array); + *pages = NULL; +} + +static void *__io_uaddr_map(struct page ***pages, unsigned short *npages, + unsigned long uaddr, size_t size) +{ + struct page **page_array; + unsigned int nr_pages; + int ret; + + *npages = 0; + + if (uaddr & (PAGE_SIZE - 1) || !size) + return ERR_PTR(-EINVAL); + + nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + if (nr_pages > USHRT_MAX) + return ERR_PTR(-EINVAL); + page_array = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL); + if (!page_array) + return ERR_PTR(-ENOMEM); + + ret = pin_user_pages_fast(uaddr, nr_pages, FOLL_WRITE | FOLL_LONGTERM, + page_array); + if (ret != nr_pages) { +err: + io_pages_free(&page_array, ret > 0 ? ret : 0); + return ret < 0 ? ERR_PTR(ret) : ERR_PTR(-EFAULT); + } + /* + * Should be a single page. If the ring is small enough that we can + * use a normal page, that is fine. If we need multiple pages, then + * userspace should use a huge page. That's the only way to guarantee + * that we get contigious memory, outside of just being lucky or + * (currently) having low memory fragmentation. + */ + if (page_array[0] != page_array[ret - 1]) + goto err; + *pages = page_array; + *npages = nr_pages; + return page_to_virt(page_array[0]); +} + +static void *io_rings_map(struct io_ring_ctx *ctx, unsigned long uaddr, + size_t size) +{ + return __io_uaddr_map(&ctx->ring_pages, &ctx->n_ring_pages, uaddr, + size); +} + +static void *io_sqes_map(struct io_ring_ctx *ctx, unsigned long uaddr, + size_t size) +{ + return __io_uaddr_map(&ctx->sqe_pages, &ctx->n_sqe_pages, uaddr, + size); +} + static void io_rings_free(struct io_ring_ctx *ctx) { - io_mem_free(ctx->rings); - io_mem_free(ctx->sq_sqes); - ctx->rings = NULL; - ctx->sq_sqes = NULL; + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) { + io_mem_free(ctx->rings); + io_mem_free(ctx->sq_sqes); + ctx->rings = NULL; + ctx->sq_sqes = NULL; + } else { + io_pages_free(&ctx->ring_pages, ctx->n_ring_pages); + io_pages_free(&ctx->sqe_pages, ctx->n_sqe_pages); + } } static void *io_mem_alloc(size_t size) @@ -3359,6 +3432,10 @@ static void *io_uring_validate_mmap_request(struct file *file, struct page *page; void *ptr; + /* Don't allow mmap if the ring was setup without it */ + if (ctx->flags & IORING_SETUP_NO_MMAP) + return ERR_PTR(-EINVAL); + switch (offset & IORING_OFF_MMAP_MASK) { case IORING_OFF_SQ_RING: case IORING_OFF_CQ_RING: @@ -3694,7 +3771,11 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, if (size == SIZE_MAX) return -EOVERFLOW; - rings = io_mem_alloc(size); + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + rings = io_mem_alloc(size); + else + rings = io_rings_map(ctx, p->cq_off.user_addr, size); + if (IS_ERR(rings)) return PTR_ERR(rings); @@ -3714,13 +3795,17 @@ static __cold int io_allocate_scq_urings(struct io_ring_ctx *ctx, return -EOVERFLOW; } - ptr = io_mem_alloc(size); + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + ptr = io_mem_alloc(size); + else + ptr = io_sqes_map(ctx, p->sq_off.user_addr, size); + if (IS_ERR(ptr)) { io_rings_free(ctx); return PTR_ERR(ptr); } - ctx->sq_sqes = io_mem_alloc(size); + ctx->sq_sqes = ptr; return 0; } @@ -3906,7 +3991,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->sq_off.dropped = offsetof(struct io_rings, sq_dropped); p->sq_off.array = (char *)ctx->sq_array - (char *)ctx->rings; p->sq_off.resv1 = 0; - p->sq_off.resv2 = 0; + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + p->sq_off.user_addr = 0; p->cq_off.head = offsetof(struct io_rings, cq.head); p->cq_off.tail = offsetof(struct io_rings, cq.tail); @@ -3916,7 +4002,8 @@ static __cold int io_uring_create(unsigned entries, struct io_uring_params *p, p->cq_off.cqes = offsetof(struct io_rings, cqes); p->cq_off.flags = offsetof(struct io_rings, cq_flags); p->cq_off.resv1 = 0; - p->cq_off.resv2 = 0; + if (!(ctx->flags & IORING_SETUP_NO_MMAP)) + p->cq_off.user_addr = 0; p->features = IORING_FEAT_SINGLE_MMAP | IORING_FEAT_NODROP | IORING_FEAT_SUBMIT_STABLE | IORING_FEAT_RW_CUR_POS | @@ -3982,7 +4069,8 @@ static long io_uring_setup(u32 entries, struct io_uring_params __user *params) IORING_SETUP_R_DISABLED | IORING_SETUP_SUBMIT_ALL | IORING_SETUP_COOP_TASKRUN | IORING_SETUP_TASKRUN_FLAG | IORING_SETUP_SQE128 | IORING_SETUP_CQE32 | - IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN)) + IORING_SETUP_SINGLE_ISSUER | IORING_SETUP_DEFER_TASKRUN | + IORING_SETUP_NO_MMAP)) return -EINVAL; return io_uring_create(entries, &p, params);