From patchwork Fri Oct 25 14:12:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13850771 Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17A814204D for ; Fri, 25 Oct 2024 14:14:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865650; cv=none; b=nT3qQ8XMRHt+4bnzu2ZJ4Qc9vQTCawuSrF+b6rLRZLfjpWRpxy2iS69QGzSdOubJ13iif5av05IkdBvNPJAJzZcqW0/Toeka3xBYv7dUoq9kQHt+//iJI06J8SfIeundGviJpA8MI4tjEvuGKlIMgXpwG66wO4n/B4XW0IF+mv8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865650; c=relaxed/simple; bh=my0fm8LbHFc9gjVpgCS6YSeHPVh3y9Nz69IniuB/M5g=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gZOin9CpFRNpvFuuQfVMzHq2rSky4jgjlxUXi8NybgpaR9knRX+cXHzex0WtEzopn1oOe8AOebdNu2a05W84u3M7SB3qJfQ4i8IZb9L8KKZ2mz6n/mDZ6NPQUXLkRrhOeTJ/yU1oXfbuiwc2LUbiXyq9xWjY/wSK4xEcNQ5g6jo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=No8vlC3k; arc=none smtp.client-ip=209.85.166.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="No8vlC3k" Received: by mail-io1-f46.google.com with SMTP id ca18e2360f4ac-83aba237c03so78711939f.3 for ; Fri, 25 Oct 2024 07:14:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1729865646; x=1730470446; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ld0ONGuVg0T6cvZPhvY8xhX+HTZGFZmfqIprNzjRZvQ=; b=No8vlC3kCMPdOZDLT8diY+dm/ImF5wA21ihDUVWlZFJyj+cV5iFtu/Oz4/vILYcWo3 Y2MC9AGs8e3I+djZ4utQay2af9gwRf8UTKDn9i7dKf2zc+OmX1dArStBzw95vf/EnyoD hhlLcdHGDLneFIThBbulja+Hw7KhuOzv7siB2CupyYXs8jXW/803gYEgdYO0gFQok9kG mQIrckjVdXC8v/jaaIhoqs9t7HDG7ryQSQUAdGWMPPtgpnJPs0o588a++xADViw8jJVs 9XNlw8IeMuc92lmkm1S9rRiUBo698HVa2l2H7PTvM2QYqA7vS5s1gCTnJAyGkHO+yhbp yaYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729865646; x=1730470446; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ld0ONGuVg0T6cvZPhvY8xhX+HTZGFZmfqIprNzjRZvQ=; b=Is8a7QxpsdqjtGUGDoPV/YdbDWbyBlUdH/grDuHlLz+kSKlxtRykYeAh6H+OT9JYcS AKNNEX039fyV8ef4uF9wpDY5QMGNQlydbIa5TF9Z3vEjbtCv89GyyNK0eQrbMq+DANsq 4lgdfPWOLrCAXsipX4JWvkTKJ0Op8+kzobo8ApFJyYmt52qnJ6Aj2si/hP2ra9Fw2IgA KGcW0W2bd1plzfpSb6tVTkqxBJki3jFGiAOhB4JUGEn/GT42QviMQSckLLN9nMsg4lL2 bG+gh6nXNnk0pVOWTjZG1+NYC1/JTA5rF+JjG6A0c5K/0uE0j++uH0agz4HM+xsQ827s cRjQ== X-Gm-Message-State: AOJu0YxRtcaRl0DaY2MQSJ5Q+m4V6FeACJ6I9LrjR7VzIZeN55IM7aZi FVJAgGKpHXpYzM/ftgIqtyHV4S7LEP9gfFQ4l3m9Ox6bXFMw/cX19fnR9/3P9GpecwNhfEblsDH E X-Google-Smtp-Source: AGHT+IH572IyFTFpLI+WkmMY1BYM04d5htu84+m7WjIf1e402ShzCXqc38blueWGFzQqDSI/diLE1g== X-Received: by 2002:a05:6602:640f:b0:83a:b901:3150 with SMTP id ca18e2360f4ac-83af6199c33mr1138513239f.8.1729865646435; Fri, 25 Oct 2024 07:14:06 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4dc725eb673sm277292173.16.2024.10.25.07.14.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 07:14:05 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 1/3] io_uring: switch struct ext_arg from __kernel_timespec to timespec64 Date: Fri, 25 Oct 2024 08:12:58 -0600 Message-ID: <20241025141403.169518-2-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241025141403.169518-1-axboe@kernel.dk> References: <20241025141403.169518-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 This avoids intermediate storage for turning a __kernel_timespec user pointer into an on-stack struct timespec64, only then to turn it into a ktime_t. Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 140cd47fbdb3..8f0e0749a581 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2495,9 +2495,10 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct ext_arg { size_t argsz; - struct __kernel_timespec __user *ts; + struct timespec64 ts; const sigset_t __user *sig; ktime_t min_time; + bool ts_set; }; /* @@ -2535,13 +2536,8 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, u32 flags, iowq.timeout = KTIME_MAX; start_time = io_get_time(ctx); - if (ext_arg->ts) { - struct timespec64 ts; - - if (get_timespec64(&ts, ext_arg->ts)) - return -EFAULT; - - iowq.timeout = timespec64_to_ktime(ts); + if (ext_arg->ts_set) { + iowq.timeout = timespec64_to_ktime(ext_arg->ts); if (!(flags & IORING_ENTER_ABS_TIMER)) iowq.timeout = ktime_add(iowq.timeout, start_time); } @@ -3252,7 +3248,6 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, */ if (!(flags & IORING_ENTER_EXT_ARG)) { ext_arg->sig = (const sigset_t __user *) argp; - ext_arg->ts = NULL; return 0; } @@ -3267,7 +3262,11 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, ext_arg->min_time = arg.min_wait_usec * NSEC_PER_USEC; ext_arg->sig = u64_to_user_ptr(arg.sigmask); ext_arg->argsz = arg.sigmask_sz; - ext_arg->ts = u64_to_user_ptr(arg.ts); + if (arg.ts) { + if (get_timespec64(&ext_arg->ts, u64_to_user_ptr(arg.ts))) + return -EFAULT; + ext_arg->ts_set = true; + } return 0; } From patchwork Fri Oct 25 14:12:59 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13850772 Received: from mail-il1-f176.google.com (mail-il1-f176.google.com [209.85.166.176]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DA6818E76C for ; Fri, 25 Oct 2024 14:14:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.176 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865651; cv=none; b=mQtlgW/Z9zCQ+xyaqSGh4EPneOs0U1fxTdlxhJH5rDhtBZC67sXbJ/dq7hPOhop9Blmnp++mjEm+EXY/ZMAznDbhO34NrXcUofReAsCL4m8JgOrSI3q8vHwIuXkCmvPGpHSoRsjUF9ycN+UTIKpOerac1C96Ch1AhEJVv6JKif4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865651; c=relaxed/simple; bh=1/3N/TaZdaIixqzMewgYD2v4MkCzC3PFbFzzJvqemwk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PNejDiXIj0utaynGFUzbTNhPZS2cQsLw+rQzHbO171B0dx93bZu2mBvQ1taLSYHfPLjEyy0GRPxjqLhU8H2/w3E+wheihH/pH33VxTDALSNItZwhRm6QEyZ1yOO43UuTDuXhA2sb5uwUjQo26T/Vei2ISb40T7sJ2slsIvV8hxc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=IEYxRep5; arc=none smtp.client-ip=209.85.166.176 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="IEYxRep5" Received: by mail-il1-f176.google.com with SMTP id e9e14a558f8ab-3a39f73a2c7so8784535ab.0 for ; Fri, 25 Oct 2024 07:14:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1729865648; x=1730470448; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=jnvYP8nCvgPpcLvOgegQl4kPLrKF64NfQHj1rP1I3JU=; b=IEYxRep5w3NveqIOLJXgPrDYxC2/zphBdd3t9dBG0ERSq4/4nssyalRltnYWYOMZHQ uDCgmDMqryav0lpa289sZMidV38YKw8y+wDUvMtEyhfhXgHbGhKbrLLeDDIklLHvGxKn nLj2X4X2xTX+ywzDL/PpgpEskXDnzRe44KqEFirxidxvofsoz7PF8AVX/mTf7o6Tiefh vnojHztOgLqtWfUnJ75GM7DKUt8lSrQ6L2Wpm0m2W76OTBvomL/ci5NFitywNUOSZ5Mn XhESoRS+HFd7Mzru63Bc1UWjq8z5vSABDDvIOCslgExdWcCfPD+n9K/JjLUz5KMsav7t jOtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729865648; x=1730470448; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jnvYP8nCvgPpcLvOgegQl4kPLrKF64NfQHj1rP1I3JU=; b=hvfgbN0B7fYR1V+B5CCorvbc93zm75bClCcUtKvxrqS+lBqkAgF8VXJsAb/vzqYT0D Yq49Mv+d3DitB5XeBExZ7QzXB0HUz68d5bk2b3w5c6+SKLoxEShYbCgU8KZGeHoBNAL0 t6UcdiDBOlSg+CItPdPCNBKVZ86d3Fxm+g+oIaoGZIRNtO0vHX4lJY8j2UNQeDrUWglm E45NHGf+pcS1yAmuaFeRPyYdR7lotuH2nPixc+aUSl52duszReM+Wo4We0hFhDspbnX8 MQdtvGC4KalePsdKwPhNc/cUtkIugPMwYS5d4RDt/88rCcAcEBB5HqQXVLVjJO/VMCo1 hBuA== X-Gm-Message-State: AOJu0Yxafxe4P498qGzr9f+wCNauKyYweKGpocKwQvcrDIv/LFW9CmP0 TOm58yv0au1mVZLqvWRnV8x6XBRBWoKR3mUrnDP8L+k9CitkJHJQDHRawLgzo/Q8tVScBKX5Jce C X-Google-Smtp-Source: AGHT+IGD2qqVsRA969wfOwd7OboIh9jjNNvNfr0M7AP6To284FKlplee5rPKiWv0ZRG8GOHQdqaHEw== X-Received: by 2002:a05:6e02:160b:b0:3a4:da0a:3767 with SMTP id e9e14a558f8ab-3a4da0a3af6mr98399155ab.17.1729865648009; Fri, 25 Oct 2024 07:14:08 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4dc725eb673sm277292173.16.2024.10.25.07.14.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 07:14:06 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe , Keith Busch Subject: [PATCH 2/3] io_uring: change io_get_ext_arg() to use uaccess begin + end Date: Fri, 25 Oct 2024 08:12:59 -0600 Message-ID: <20241025141403.169518-3-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241025141403.169518-1-axboe@kernel.dk> References: <20241025141403.169518-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In scenarios where a high frequency of wait events are seen, the copy of the struct io_uring_getevents_arg is quite noticeable in the profiles in terms of time spent. It can be seen as up to 3.5-4.5%. Rewrite the copy-in logic, saving about 0.5% of the time. Reviewed-by: Keith Busch Signed-off-by: Jens Axboe --- io_uring/io_uring.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 8f0e0749a581..4cd0ee52710d 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -3240,6 +3240,7 @@ static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t a static int io_get_ext_arg(unsigned flags, const void __user *argp, struct ext_arg *ext_arg) { + const struct io_uring_getevents_arg __user *uarg = argp; struct io_uring_getevents_arg arg; /* @@ -3257,8 +3258,18 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, */ if (ext_arg->argsz != sizeof(arg)) return -EINVAL; - if (copy_from_user(&arg, argp, sizeof(arg))) +#ifdef CONFIG_64BIT + if (!user_access_begin(uarg, sizeof(*uarg))) return -EFAULT; + unsafe_get_user(arg.sigmask, &uarg->sigmask, uaccess_end); + unsafe_get_user(arg.sigmask_sz, &uarg->sigmask_sz, uaccess_end); + unsafe_get_user(arg.min_wait_usec, &uarg->min_wait_usec, uaccess_end); + unsafe_get_user(arg.ts, &uarg->ts, uaccess_end); + user_access_end(); +#else + if (copy_from_user(&arg, uarg, sizeof(arg))) + return -EFAULT; +#endif ext_arg->min_time = arg.min_wait_usec * NSEC_PER_USEC; ext_arg->sig = u64_to_user_ptr(arg.sigmask); ext_arg->argsz = arg.sigmask_sz; @@ -3268,6 +3279,11 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, ext_arg->ts_set = true; } return 0; +#ifdef CONFIG_64BIT +uaccess_end: + user_access_end(); + return -EFAULT; +#endif } SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, From patchwork Fri Oct 25 14:13:00 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13850773 Received: from mail-io1-f47.google.com (mail-io1-f47.google.com [209.85.166.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D558D1FB899 for ; Fri, 25 Oct 2024 14:14:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.47 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865654; cv=none; b=jfEzIqbkx4haJloVZvXmPPF7K/ohCeZY7w7sNwxVB2EQQ5uvtyaz50njuTroYO1AStLF9MXFxsxTT5IbpVh/LEIq53XwrAYQyrEJd9FFIXG/8c4dcOiPqxAZQcTBcZ2CjQhe0SxyLG32PCunytf+nmrvX4JEtucqbL3xvBCT+sY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1729865654; c=relaxed/simple; bh=TFLXeVbjDBK3oRUxG43NOPRmcmPuVe4er0iJBbUtLKg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hsxvWBc/MyF5cawk6EuLp5XDctPKRI37DILRp3veBQNjD0+B+EF9kg8ZIspYtyhRaUrBHDrYUJozA/pyJC2KVwFfbtx6ukKoBZr7l7YJkUP1qiu4fFbXCKZT94xXThFBEulSIxCO8CUoPdETNgPALIZYldCfqM80Rt6Hr6IwkOA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=zwG5Xhhe; arc=none smtp.client-ip=209.85.166.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="zwG5Xhhe" Received: by mail-io1-f47.google.com with SMTP id ca18e2360f4ac-83a9be2c0e6so80205139f.2 for ; Fri, 25 Oct 2024 07:14:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1729865650; x=1730470450; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=U0fZJfyzyn/9wmH25vXqeJ0H5GV6ZLBvZYzSg9ipdwM=; b=zwG5XhheSKcsKSdCUQH4xhSxezbWpUhoIrz5PXOhOSsYJEZSTTFGkQndeMKCT1lr3X 4w3DxzY/rp+XXPFrqraCaxAQpQ79pydxS/a7stD3Bb+SVqUGGqpcuVI3xm6eejsGOGo6 7QwovQBNOUjLtSqf++s2STsEfd2/w/vtmVifUhcoMFLpbs3gqE+FRFh2r/xBk4CMgMAR ya10es0PUVAtQFC52kuiWaFvQwPzDMBULbiWsSjD/Znodf3NoLEg2ZtDD/blHYQEmrNo kXZQsvYidewMEEHoZGGKPUn7ppgXhN/2BiASnFGaKxHPmVM6+n6qsVxnXePkMzAWobgM sHIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729865650; x=1730470450; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=U0fZJfyzyn/9wmH25vXqeJ0H5GV6ZLBvZYzSg9ipdwM=; b=r6xgcJXc61ZZ0HtJO1viMPLoor84A4Oi+iM3lMEnI4J/ihXcQKI8Gzn/YJDQg8nFwH Nx/3nS6Yh+mG0HSsjQcMdLkG0qnKALZmp7ZtLumW/tsvzvvgaqmBK8qMTE5PWlIU3D2q gSwR8CzoRiZLS8mxkvtufreVaUG8Ee44S2ll0cyzJSrZLOAL3V8CWIPir2W+nUTCYMMB 461szRMxDFEGy/KaZJ5DnFTIjtg5Nmxp3sYPr1JaAuRVfUd5CU7QqCQzwWY+RKvB5eOJ aKcJAKNi29Q7UtUdlTxhS0MeX0GDFUzYloyXXnajp4e5cuC2dQx+STmjHaVnPs6b8qr6 fxjw== X-Gm-Message-State: AOJu0YyjQkO1I3kbkLot0qJUyRRrntAC1AI0F9D6p/guBAjLErJdW8Ic EXb+N4TR4en8EHtDkY5r5wAl8cBcKfJYlI5rjmz85b+pfcsIp/+0v0YYFi8e5MHgDc5qMLHvOQs u X-Google-Smtp-Source: AGHT+IEE6hVUm4Sbu908CCg8SvLPhAXw3Py1cpOtbUzSOJT0qbQjgP7eVDyiC2+OGKO6dw+fYstRpA== X-Received: by 2002:a05:6602:15d1:b0:82d:13ce:2956 with SMTP id ca18e2360f4ac-83af61e95d7mr903578539f.10.1729865650108; Fri, 25 Oct 2024 07:14:10 -0700 (PDT) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4dc725eb673sm277292173.16.2024.10.25.07.14.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 07:14:08 -0700 (PDT) From: Jens Axboe To: io-uring@vger.kernel.org Cc: Jens Axboe Subject: [PATCH 3/3] io_uring: add support for fixed wait regions Date: Fri, 25 Oct 2024 08:13:00 -0600 Message-ID: <20241025141403.169518-4-axboe@kernel.dk> X-Mailer: git-send-email 2.45.2 In-Reply-To: <20241025141403.169518-1-axboe@kernel.dk> References: <20241025141403.169518-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Generally applications have 1 or a few waits of waiting, yet they pass in a struct io_uring_getevents_arg every time. This needs to get copied and, in turn, the timeout value needs to get copied. Rather than do this for every invocation, allow the application to register a fixed set of wait regions that can simply be indexed when asking the kernel to wait on events. At ring setup time, the application can register a number of these wait regions and initialize region/index 0 upfront: struct io_uring_reg_wait *reg; reg = io_uring_setup_reg_wait(ring, nr_regions, &ret); /* set timeout and mark as set, sigmask/sigmask_sz as needed */ reg->ts.tv_sec = 0; reg->ts.tv_nsec = 100000; reg->flags = IORING_REG_WAIT_TS; where nr_regions >= 1 && nr_regions <= PAGE_SIZE / sizeof(*reg). The above initializes index 0, but 63 other regions can be initialized, if needed. Now, instead of doing: struct __kernel_timespec timeout = { .tv_nsec = 100000, }; io_uring_submit_and_wait_timeout(ring, &cqe, nr, &t, NULL); to wait for events for each submit_and_wait, or just wait, operation, it can just reference the above region at offset 0 and do: io_uring_submit_and_wait_reg(ring, &cqe, nr, 0); to achieve the same goal of waiting 100usec without needing to copy both struct io_uring_getevents_arg (24b) and struct __kernel_timeout (16b) for each invocation. Struct io_uring_reg_wait looks as follows: struct io_uring_reg_wait { struct __kernel_timespec ts; __u32 min_wait_usec; __u32 flags; __u64 sigmask; __u32 sigmask_sz; __u32 pad[3]; __u64 pad2[2]; }; embedding the timeout itself in the region, rather than passing it as a pointer as well. Note that the signal mask is still passed as a pointer, both for compatability reasons, but also because there doesn't seem to be a lot of high frequency waits scenarios that involve setting and resetting the signal mask for each wait. The application is free to modify any region before a wait call, or it can use keep multiple regions with different settings to avoid needing to modify the same one for wait calls. Up to a page size of regions is mapped by default, allowing PAGE_SIZE / 64 available regions for use. The registered region must fit within a page. On a 4kb page size system, that allows for 64 wait regions if a full page is used, as the size of struct io_uring_reg_wait is 64b. The region registered must be aligned to io_uring_reg_wait in size. It's valid to register less than 64 entries. In network performance testing with zero-copy, this reduced the time spent waiting on the TX side from 3.12% to 0.3% and the RX side from 4.4% to 0.3%. Wait regions are fixed for the lifetime of the ring - once registered, they are persistent until the ring is torn down. The regions support minimum wait timeout as well as the regular waits. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 10 +++++ include/uapi/linux/io_uring.h | 41 +++++++++++++++++ io_uring/io_uring.c | 68 +++++++++++++++++++++++----- io_uring/register.c | 82 ++++++++++++++++++++++++++++++++++ io_uring/register.h | 1 + 5 files changed, 191 insertions(+), 11 deletions(-) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 841579dcdae9..2f12828b22a4 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -327,6 +327,14 @@ struct io_ring_ctx { atomic_t cq_wait_nr; atomic_t cq_timeouts; struct wait_queue_head cq_wait; + + /* + * If registered with IORING_REGISTER_CQWAIT_REG, a single + * page holds N entries, mapped in cq_wait_arg. cq_wait_index + * is the maximum allowable index. + */ + struct io_uring_reg_wait *cq_wait_arg; + unsigned char cq_wait_index; } ____cacheline_aligned_in_smp; /* timeouts */ @@ -430,6 +438,8 @@ struct io_ring_ctx { unsigned short n_sqe_pages; struct page **ring_pages; struct page **sqe_pages; + + struct page **cq_wait_page; }; struct io_tw_state { diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index c4737892c7cd..7dfa046b3c61 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -518,6 +518,7 @@ struct io_cqring_offsets { #define IORING_ENTER_EXT_ARG (1U << 3) #define IORING_ENTER_REGISTERED_RING (1U << 4) #define IORING_ENTER_ABS_TIMER (1U << 5) +#define IORING_ENTER_EXT_ARG_REG (1U << 6) /* * Passed in for io_uring_setup(2). Copied back with updated info on success @@ -618,6 +619,9 @@ enum io_uring_register_op { /* resize CQ ring */ IORING_REGISTER_RESIZE_RINGS = 33, + /* register fixed io_uring_reg_wait arguments */ + IORING_REGISTER_CQWAIT_REG = 34, + /* this goes last */ IORING_REGISTER_LAST, @@ -801,6 +805,43 @@ enum io_uring_register_restriction_op { IORING_RESTRICTION_LAST }; +enum { + IORING_REG_WAIT_TS = (1U << 0), +}; + +/* + * Argument for IORING_REGISTER_CQWAIT_REG, registering a region of + * struct io_uring_reg_wait that can be indexed when io_uring_enter(2) is + * called rather than pass in a wait argument structure separately. + */ +struct io_uring_cqwait_reg_arg { + __u32 flags; + __u32 struct_size; + __u32 nr_entries; + __u32 pad; + __u64 user_addr; + __u64 pad2[3]; +}; + +/* + * Argument for io_uring_enter(2) with + * IORING_GETEVENTS | IORING_ENTER_EXT_ARG_REG set, where the actual argument + * is an index into a previously registered fixed wait region described by + * the below structure. + */ +struct io_uring_reg_wait { + struct __kernel_timespec ts; + __u32 min_wait_usec; + __u32 flags; + __u64 sigmask; + __u32 sigmask_sz; + __u32 pad[3]; + __u64 pad2[2]; +}; + +/* + * Argument for io_uring_enter(2) with IORING_GETEVENTS | IORING_ENTER_EXT_ARG + */ struct io_uring_getevents_arg { __u64 sigmask; __u32 sigmask_sz; diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 4cd0ee52710d..2863b957e373 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2736,6 +2736,7 @@ static __cold void io_ring_ctx_free(struct io_ring_ctx *ctx) io_alloc_cache_free(&ctx->msg_cache, io_msg_cache_free); io_futex_cache_free(ctx); io_destroy_buffers(ctx); + io_unregister_cqwait_reg(ctx); mutex_unlock(&ctx->uring_lock); if (ctx->sq_creds) put_cred(ctx->sq_creds); @@ -3224,21 +3225,43 @@ void __io_uring_cancel(bool cancel_all) io_uring_cancel_generic(cancel_all, NULL); } -static int io_validate_ext_arg(unsigned flags, const void __user *argp, size_t argsz) +static struct io_uring_reg_wait *io_get_ext_arg_reg(struct io_ring_ctx *ctx, + const struct io_uring_getevents_arg __user *uarg) { - if (flags & IORING_ENTER_EXT_ARG) { - struct io_uring_getevents_arg arg; + struct io_uring_reg_wait *arg = READ_ONCE(ctx->cq_wait_arg); - if (argsz != sizeof(arg)) + if (arg) { + unsigned int index = (unsigned int) (uintptr_t) uarg; + + if (index <= ctx->cq_wait_index) + return arg + index; + } + + return ERR_PTR(-EFAULT); +} + +static int io_validate_ext_arg(struct io_ring_ctx *ctx, unsigned flags, + const void __user *argp, size_t argsz) +{ + struct io_uring_getevents_arg arg; + + if (!(flags & IORING_ENTER_EXT_ARG)) + return 0; + + if (flags & IORING_ENTER_EXT_ARG_REG) { + if (argsz != sizeof(struct io_uring_reg_wait)) return -EINVAL; - if (copy_from_user(&arg, argp, sizeof(arg))) - return -EFAULT; + return PTR_ERR(io_get_ext_arg_reg(ctx, argp)); } + if (argsz != sizeof(arg)) + return -EINVAL; + if (copy_from_user(&arg, argp, sizeof(arg))) + return -EFAULT; return 0; } -static int io_get_ext_arg(unsigned flags, const void __user *argp, - struct ext_arg *ext_arg) +static int io_get_ext_arg(struct io_ring_ctx *ctx, unsigned flags, + const void __user *argp, struct ext_arg *ext_arg) { const struct io_uring_getevents_arg __user *uarg = argp; struct io_uring_getevents_arg arg; @@ -3252,6 +3275,28 @@ static int io_get_ext_arg(unsigned flags, const void __user *argp, return 0; } + if (flags & IORING_ENTER_EXT_ARG_REG) { + struct io_uring_reg_wait *w; + + if (ext_arg->argsz != sizeof(struct io_uring_reg_wait)) + return -EINVAL; + w = io_get_ext_arg_reg(ctx, argp); + if (IS_ERR(w)) + return PTR_ERR(w); + + if (w->flags & ~IORING_REG_WAIT_TS) + return -EINVAL; + ext_arg->min_time = READ_ONCE(w->min_wait_usec) * NSEC_PER_USEC; + ext_arg->sig = u64_to_user_ptr(READ_ONCE(w->sigmask)); + ext_arg->argsz = READ_ONCE(w->sigmask_sz); + if (w->flags & IORING_REG_WAIT_TS) { + ext_arg->ts.tv_sec = READ_ONCE(w->ts.tv_sec); + ext_arg->ts.tv_nsec = READ_ONCE(w->ts.tv_nsec); + ext_arg->ts_set = true; + } + return 0; + } + /* * EXT_ARG is set - ensure we agree on the size of it and copy in our * timespec and sigset_t pointers if good. @@ -3297,7 +3342,8 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, if (unlikely(flags & ~(IORING_ENTER_GETEVENTS | IORING_ENTER_SQ_WAKEUP | IORING_ENTER_SQ_WAIT | IORING_ENTER_EXT_ARG | IORING_ENTER_REGISTERED_RING | - IORING_ENTER_ABS_TIMER))) + IORING_ENTER_ABS_TIMER | + IORING_ENTER_EXT_ARG_REG))) return -EINVAL; /* @@ -3380,7 +3426,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, */ mutex_lock(&ctx->uring_lock); iopoll_locked: - ret2 = io_validate_ext_arg(flags, argp, argsz); + ret2 = io_validate_ext_arg(ctx, flags, argp, argsz); if (likely(!ret2)) { min_complete = min(min_complete, ctx->cq_entries); @@ -3390,7 +3436,7 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, to_submit, } else { struct ext_arg ext_arg = { .argsz = argsz }; - ret2 = io_get_ext_arg(flags, argp, &ext_arg); + ret2 = io_get_ext_arg(ctx, flags, argp, &ext_arg); if (likely(!ret2)) { min_complete = min(min_complete, ctx->cq_entries); diff --git a/io_uring/register.c b/io_uring/register.c index fc6c94d694b2..1eb686eaa310 100644 --- a/io_uring/register.c +++ b/io_uring/register.c @@ -570,6 +570,82 @@ static int io_register_resize_rings(struct io_ring_ctx *ctx, void __user *arg) return ret; } +void io_unregister_cqwait_reg(struct io_ring_ctx *ctx) +{ + unsigned short npages = 1; + + if (!ctx->cq_wait_page) + return; + + io_pages_unmap(ctx->cq_wait_arg, &ctx->cq_wait_page, &npages, true); + ctx->cq_wait_arg = NULL; + if (ctx->user) + __io_unaccount_mem(ctx->user, 1); +} + +/* + * Register a page holding N entries of struct io_uring_reg_wait, which can + * be used via io_uring_enter(2) if IORING_GETEVENTS_EXT_ARG_REG is set. + * If that is set with IORING_GETEVENTS_EXT_ARG, then instead of passing + * in a pointer for a struct io_uring_getevents_arg, an index into this + * registered array is passed, avoiding two (arg + timeout) copies per + * invocation. + */ +static int io_register_cqwait_reg(struct io_ring_ctx *ctx, void __user *uarg) +{ + struct io_uring_cqwait_reg_arg arg; + struct io_uring_reg_wait *reg; + struct page **pages; + unsigned long len; + int nr_pages, poff; + int ret; + + if (ctx->cq_wait_page || ctx->cq_wait_arg) + return -EBUSY; + if (copy_from_user(&arg, uarg, sizeof(arg))) + return -EFAULT; + if (!arg.nr_entries || arg.flags) + return -EINVAL; + if (arg.struct_size != sizeof(*reg)) + return -EINVAL; + if (check_mul_overflow(arg.struct_size, arg.nr_entries, &len)) + return -EOVERFLOW; + if (len > PAGE_SIZE) + return -EINVAL; + /* offset + len must fit within a page, and must be reg_wait aligned */ + poff = arg.user_addr & ~PAGE_MASK; + if (len + poff > PAGE_SIZE) + return -EINVAL; + if (poff % arg.struct_size) + return -EINVAL; + + pages = io_pin_pages(arg.user_addr, len, &nr_pages); + if (IS_ERR(pages)) + return PTR_ERR(pages); + ret = -EINVAL; + if (nr_pages != 1) + goto out_free; + if (ctx->user) { + ret = __io_account_mem(ctx->user, 1); + if (ret) + goto out_free; + } + + reg = vmap(pages, 1, VM_MAP, PAGE_KERNEL); + if (reg) { + ctx->cq_wait_index = arg.nr_entries - 1; + WRITE_ONCE(ctx->cq_wait_page, pages); + WRITE_ONCE(ctx->cq_wait_arg, (void *) reg + poff); + return 0; + } + ret = -ENOMEM; + if (ctx->user) + __io_unaccount_mem(ctx->user, 1); +out_free: + io_pages_free(&pages, nr_pages); + return ret; +} + static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, void __user *arg, unsigned nr_args) __releases(ctx->uring_lock) @@ -764,6 +840,12 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, break; ret = io_register_resize_rings(ctx, arg); break; + case IORING_REGISTER_CQWAIT_REG: + ret = -EINVAL; + if (!arg || nr_args != 1) + break; + ret = io_register_cqwait_reg(ctx, arg); + break; default: ret = -EINVAL; break; diff --git a/io_uring/register.h b/io_uring/register.h index a5f39d5ef9e0..3e935e8fa4b2 100644 --- a/io_uring/register.h +++ b/io_uring/register.h @@ -5,5 +5,6 @@ int io_eventfd_unregister(struct io_ring_ctx *ctx); int io_unregister_personality(struct io_ring_ctx *ctx, unsigned id); struct file *io_uring_register_get_file(unsigned int fd, bool registered); +void io_unregister_cqwait_reg(struct io_ring_ctx *ctx); #endif