From patchwork Mon Feb 3 16:23:39 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13957800 Received: from mail-io1-f52.google.com (mail-io1-f52.google.com [209.85.166.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B7842036EC for ; Mon, 3 Feb 2025 16:31:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600284; cv=none; b=BaBHTbkqcIBBxB2U7BkYO/mLnmfw4O9OPepjFHwrQxX4gRipE/Nh9eI9R5Rmcxrg67alCyTAqqQRe/GXQC7+xc+Vez4VRhUN1MeL7yQ45SD8RqD9fxPpWSgtFq7StEU0NUwcdDx9pCQcHtzJrL521eiJUhkmWukD88ZpfoW+8Qc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600284; c=relaxed/simple; bh=eGiwkNWqVn4n6sELTRMgfaMcoF7E583l9mfCxaa+7q8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GecuO9q/8445BPTzCgYffaFFWW2x2KHsEqa/+BSSJj7Dlb2TPZb0UyR3d9xEw2yKHnveqmys3MCnEobgyTPAI9WHT13k1Ea/z4cOtRFr9ldXqYjikdAwVj8c/1eACsT9dpLIJP7YOucUjfZQa4LC3CPs5MZXDYV1tKvjH4ZiaPY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=rS1BziiW; arc=none smtp.client-ip=209.85.166.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="rS1BziiW" Received: by mail-io1-f52.google.com with SMTP id ca18e2360f4ac-844e7bc6d84so137847939f.0 for ; Mon, 03 Feb 2025 08:31:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738600282; x=1739205082; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=enKCXdT18Z+amYe4hyaD8mGmZru39ch23j+4VBK1x/o=; b=rS1BziiWAgGhu11kKhVQo47FFq/EvFRuqK7n86uUQ7zwxw/xn8mz5e8qZscCkE7VF1 SSLAwjIvlWje7nVPnfCv1WYxdQuFbfO4jaoyIg9v9Lz2J6aLBvQEDaSMXuIYVoOsIqoa z5zKTKzl0SsBM1otLXwaOiaOZLQ23wnIGoQWsauuFZROQyXKP0zDLto3Vw28ZkZgcKRd x8z18umkodR4EIZK1R5ptxGxBPTctNDXEpPBVQGU5QGY7xdmWSS2fPedlNAn2WRPAt07 lYtf2yx1K02kEat8yCQDi3b8fxQ/oNLGoiUYScFSgwko3kmsdO+8Ri0QDNsFh6/jUxLm Gwgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738600282; x=1739205082; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=enKCXdT18Z+amYe4hyaD8mGmZru39ch23j+4VBK1x/o=; b=iKCmlf27F3ZHU/IfpBo6rM5+JFvxfRtNUsU6m0mmCE/BoI++OYGtw2SeLRJ1qqkKZj QCtyqPF6BaZjLshfa8xVkRTO8UDS4c3uDuwLKIP3Xiv0o+QEF6BoGHQAqv3VDpf68dtb tfoludcDKQJh4RNddnBDXPoE0Zm8C8aHKIDLb0kX1m1yhlo530ZJdpxtNRRndAXfoCCG cf09JjPlXvh/N4gPd/qaQ6qnt6cAY0+vFJ8dnNcKPrlwzdwVuOfQ/4H9KjEPTFfpr8Js FJ8PEV/827UIk8pV/qR/lJvrOcVRWw7bZHVxR9aBOTMAOolRQmwwFq4HPBpwWs9sBq/L 6ZCA== X-Gm-Message-State: AOJu0YyRJnsbRd/H/j54vbTprMpQnPcr9ehMFBVrFPf6sy9FUcoqzCC8 y0uc7nhk8X43FFWi6G62ymuW/NO71H334JekaARuCLNQRwHotnBHKeVQET+dxXH80mvazvBkzkY zsAY= X-Gm-Gg: ASbGncsywufiNEJ2Aec4f9gy+ieKT5XFJgDoQtIQiWQkSzNZQ6RNYyke0PZZ6C6szYE io5PTXJ+cEJzUIY8U8xe2vD71OLcALjQSK4tmfKMyrRRjYTu06FkuE6DwwV8+n13qTMVLlq4vdb U57d5xpiXlVPlD+6JKDIz8Ky7YMqTx2/RKhDi2aFzMTj7l94hQB+/xAqHCa9OVnvBZuAFOd/WVR srrC/3Tizm6npdScygV6Hr5L+4HAiybJC8snchLwxk9llk8JU4VjeInEjcHa2mSTa1Bc3WctBPZ 8i8ex/WWhThqW+xjlQU= X-Google-Smtp-Source: AGHT+IE+/Ly9uGMX1dXoTTNsBcIWamNYpswbdfikyDTw/f71r7yEWo4FphmoVn9Iqomb/zqiMGDXpQ== X-Received: by 2002:a05:6602:4189:b0:84c:d479:e5a6 with SMTP id ca18e2360f4ac-8549fa3ebcdmr1542939539f.1.1738600281594; Mon, 03 Feb 2025 08:31:21 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-854a16123c6sm243748139f.24.2025.02.03.08.31.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 08:31:20 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 1/9] eventpoll: abstract out main epoll reaper into a function Date: Mon, 3 Feb 2025 09:23:39 -0700 Message-ID: <20250203163114.124077-2-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250203163114.124077-1-axboe@kernel.dk> References: <20250203163114.124077-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Add epoll_wait(), which takes a struct file and the number of events etc to reap. This can then be called by do_epoll_wait(), and used by io_uring as well. No intended functional changes in this patch. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 31 ++++++++++++++++++------------- include/linux/eventpoll.h | 4 ++++ 2 files changed, 22 insertions(+), 13 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 7c0980db77b3..73b639caed3d 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2445,12 +2445,8 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, return do_epoll_ctl(epfd, op, fd, &epds, false); } -/* - * Implement the event wait interface for the eventpoll file. It is the kernel - * part of the user space epoll_wait(2). - */ -static int do_epoll_wait(int epfd, struct epoll_event __user *events, - int maxevents, struct timespec64 *to) +int epoll_wait(struct file *file, struct epoll_event __user *events, + int maxevents, struct timespec64 *to) { struct eventpoll *ep; @@ -2462,28 +2458,37 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events, if (!access_ok(events, maxevents * sizeof(struct epoll_event))) return -EFAULT; - /* Get the "struct file *" for the eventpoll file */ - CLASS(fd, f)(epfd); - if (fd_empty(f)) - return -EBADF; - /* * We have to check that the file structure underneath the fd * the user passed to us _is_ an eventpoll file. */ - if (!is_file_epoll(fd_file(f))) + if (!is_file_epoll(file)) return -EINVAL; /* * At this point it is safe to assume that the "private_data" contains * our own data structure. */ - ep = fd_file(f)->private_data; + ep = file->private_data; /* Time to fish for events ... */ return ep_poll(ep, events, maxevents, to); } +/* + * Implement the event wait interface for the eventpoll file. It is the kernel + * part of the user space epoll_wait(2). + */ +static int do_epoll_wait(int epfd, struct epoll_event __user *events, + int maxevents, struct timespec64 *to) +{ + /* Get the "struct file *" for the eventpoll file */ + CLASS(fd, f)(epfd); + if (!fd_empty(f)) + return epoll_wait(fd_file(f), events, maxevents, to); + return -EBADF; +} + SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events, int, maxevents, int, timeout) { diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 0c0d00fcd131..f37fea931c44 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -25,6 +25,10 @@ struct file *get_epoll_tfile_raw_ptr(struct file *file, int tfd, unsigned long t /* Used to release the epoll bits inside the "struct file" */ void eventpoll_release_file(struct file *file); +/* Use to reap events */ +int epoll_wait(struct file *file, struct epoll_event __user *events, + int maxevents, struct timespec64 *to); + /* * This is called from inside fs/file_table.c:__fput() to unlink files * from the eventpoll interface. We need to have this facility to cleanup From patchwork Mon Feb 3 16:23:40 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13957801 Received: from mail-io1-f51.google.com (mail-io1-f51.google.com [209.85.166.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B653920AF98 for ; Mon, 3 Feb 2025 16:31:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600286; cv=none; b=VDdAzGCHxB6kTHlkCEZNqUKuqgf2J2alwE2ZyZP7jAEJ0x+wMP6gaGM8mxXZyq+FZAxx76zYjyNjfNERkBvm6hFQ1TfgbOQFDCpTegIrwpw5/tz8zYLXNbIQAzoSzNnXoyhmwjVx2DmVjpUcwkjrptFw/v5D3jqJwIBaSSkVX5k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600286; c=relaxed/simple; bh=HCmSTkxKH/HdcO7Uglkp2F2gW3Sb8auKNqzvPdoBqxk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e3KQbl8kXW1SyVthGRjTBtzQXGC8hJQMELthEpz9xwfxbeWqp/1hkYiwjINfNCyaBqVxyMy7BcFvT4ySa8Br3MqunHO+xAX1WGoBGm9Z1NntrTFdxPoKLZ/ULENBrq+0rf3WDVq37RaGZsn4mwn3c4Y0Y1GroC0ty2sdDH53/hs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=AxU0maFE; arc=none smtp.client-ip=209.85.166.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="AxU0maFE" Received: by mail-io1-f51.google.com with SMTP id ca18e2360f4ac-844ce6d0716so322440439f.1 for ; Mon, 03 Feb 2025 08:31:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738600283; x=1739205083; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Z+1svFbJCvXV9Zr5117Ph4lZhgiz5rV+2RtGExbCLLw=; b=AxU0maFEL3t1q4p6/lMeAJCdV8HTmJyy8G4YxU7WhQ4TGs3ORt4ICLaXexYUFRl3Bf L2taCK7BhuMo7V0Vjs2Tgpf0NMocTdKQOENhe/U0I7xctCinju03lI5AxFCGdhmbIUkt 9ttJqeRW1Qst5Hqsfw3FRyFd2ETQTEpov3TNd1QPD97BFlhZ3CPKVJ+8n2L7Kmtzo5NT RgDRZGKMI6IuNtkXO0lZ+jCtJESDYrIPKGmVfUJP6PT7Zz4ai1Sot5kNFB4DapjyIhmw 2vHxXxmEFda9j0+RxUA7yJvA/Q3gRnqg028zVZC7DHknzXKYPUiUmH5NEfN29HsaS7bT 0HNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738600283; x=1739205083; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z+1svFbJCvXV9Zr5117Ph4lZhgiz5rV+2RtGExbCLLw=; b=d22VtpY5Q9XHecWCnDoKupbWC37LCQQT0OjM+M1WplMkrHCoUllF7dVwa9rqyjAJtp wejCjjzG5mGQ5rVdHnaOUqBpztiRLvyKU8zR7F4/SD+OOwc0r7/iMYflGFbwLXYhuHDz Zdrv5lB0SCml3Q23mHbPVt8sep1rjCjC5u9TkMTnXeOw6szjXP+65X+Jk68Udt1UmdwO D6JgUUyRspwgZ9EQpIeBUbo7TjLeGBqgXnnT3RfdLOqsxjlFy1RNOn5KQHTHawud+jKJ O1GwntBX5XgsTrSuZ2lo/NKvsKICSjyk/ZH6NBh2lWS4GhzrITmRJw3FhlM7VnQuBS/p eUMQ== X-Gm-Message-State: AOJu0YxJPHTeJrM+zUk/j1wJoYc23qroU22XgRgq5ZesaFiSwwg8qczI AXoJ6uca9AqZRO262nNeSU8I+r+5KDcEZjTft8vpxIqPcKWp0/Dg2ab14CIsLrd5CUSXJCV3EKV yALA= X-Gm-Gg: ASbGncvfBmlGKnsN/D1pftL5gbX8zqiYmBMMzUGN7qu4GAxlW3jQw9XvKNc0O8QtGk9 AEsCK9+3MiFaGTk+HY6ROG+5Ge+al9xjAltYhdoAXis870Fs0BWchaT9Ri1lvLQGl6qQH0mKUMW RrJ4zhXPbu/eiZZGL8S24crablqFeaN+fRCWVSjZb2FNQSAJC5pbFI7qa5F35ETPROvMj24qeOL jLlmtiiMlZ1vhJDosIvITpozSwuIJXDD4AXMJ1Ek/YMGYYUNMZasngsN2Fb/P91JAbygy7o+PLJ c3kUAY8zx1pJ7Z1JaOc= X-Google-Smtp-Source: AGHT+IEPqq+G0h4VYNXvxcyNE35GywbajeWSs01Mm7MyH0q/0gNnCdTL/T/DNkoK0+JZYY8PlpjbgQ== X-Received: by 2002:a05:6602:2c8c:b0:852:5e4:7d9e with SMTP id ca18e2360f4ac-85427de4d4dmr2047201539f.1.1738600283057; Mon, 03 Feb 2025 08:31:23 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-854a16123c6sm243748139f.24.2025.02.03.08.31.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 08:31:22 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 2/9] eventpoll: add helper to remove wait entry from wait queue head Date: Mon, 3 Feb 2025 09:23:40 -0700 Message-ID: <20250203163114.124077-3-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250203163114.124077-1-axboe@kernel.dk> References: <20250203163114.124077-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 __epoll_wait_remove() is the core helper, it kills a given wait_queue_entry from the eventpoll wait_queue_head. Use it internally, and provide an overall helper, epoll_wait_remove(), which takes a struct file and provides the same functionality. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 58 +++++++++++++++++++++++++-------------- include/linux/eventpoll.h | 3 ++ 2 files changed, 40 insertions(+), 21 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 73b639caed3d..01edbee5c766 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1980,6 +1980,42 @@ static int ep_autoremove_wake_function(struct wait_queue_entry *wq_entry, return ret; } +static int __epoll_wait_remove(struct eventpoll *ep, + struct wait_queue_entry *wait, int timed_out) +{ + int eavail; + + /* + * We were woken up, thus go and try to harvest some events. If timed + * out and still on the wait queue, recheck eavail carefully under + * lock, below. + */ + eavail = 1; + + if (!list_empty_careful(&wait->entry)) { + write_lock_irq(&ep->lock); + /* + * If the thread timed out and is not on the wait queue, it + * means that the thread was woken up after its timeout expired + * before it could reacquire the lock. Thus, when wait.entry is + * empty, it needs to harvest events. + */ + if (timed_out) + eavail = list_empty(&wait->entry); + __remove_wait_queue(&ep->wq, wait); + write_unlock_irq(&ep->lock); + } + + return eavail; +} + +int epoll_wait_remove(struct file *file, struct wait_queue_entry *wait) +{ + if (is_file_epoll(file)) + return __epoll_wait_remove(file->private_data, wait, false); + return -EINVAL; +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -2100,27 +2136,7 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, HRTIMER_MODE_ABS); __set_current_state(TASK_RUNNING); - /* - * We were woken up, thus go and try to harvest some events. - * If timed out and still on the wait queue, recheck eavail - * carefully under lock, below. - */ - eavail = 1; - - if (!list_empty_careful(&wait.entry)) { - write_lock_irq(&ep->lock); - /* - * If the thread timed out and is not on the wait queue, - * it means that the thread was woken up after its - * timeout expired before it could reacquire the lock. - * Thus, when wait.entry is empty, it needs to harvest - * events. - */ - if (timed_out) - eavail = list_empty(&wait.entry); - __remove_wait_queue(&ep->wq, &wait); - write_unlock_irq(&ep->lock); - } + eavail = __epoll_wait_remove(ep, &wait, timed_out); } } diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index f37fea931c44..1301fc74aca0 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -29,6 +29,9 @@ void eventpoll_release_file(struct file *file); int epoll_wait(struct file *file, struct epoll_event __user *events, int maxevents, struct timespec64 *to); +/* Remove wait entry */ +int epoll_wait_remove(struct file *file, struct wait_queue_entry *wait); + /* * This is called from inside fs/file_table.c:__fput() to unlink files * from the eventpoll interface. We need to have this facility to cleanup From patchwork Mon Feb 3 16:23:41 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13957802 Received: from mail-io1-f46.google.com (mail-io1-f46.google.com [209.85.166.46]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1A39C20B205 for ; Mon, 3 Feb 2025 16:31:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.46 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600287; cv=none; b=InMHJQy2ZkNsB+ZhYMC73rF+d8u4QXObJToZqwC+Y+lwlZM0X3laKIuIYxMsUF67J3E4ITO6FKXqzm4+PFcuRMT/E9Vkqy5wPKRbeFH/jGR3VNAbo8re9NSFjIdZSL0HIPKBgmz3vBhOtbP4WFiODnLBF//grV6BS22+Iy5iBYQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600287; c=relaxed/simple; bh=FfOsgqQUbrqMBseqWoxvdeHMXbEwUwBb1lrRINWoyNY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F4YlBHYHaNQZ9QvOr6Ucrn8GlN/VosHSP3/Atsw9fxjV9jeRHQK+yDYX4nsYXKBdjx3p/ICELibCClyv8/khvMrZlf29jEJ7vArond3fAQJz2lSmQ0b8xNElnYkgK/gnRBbU+JhAP/VPht7DzPRu0pica5NQfz5VYaRhUa0yDX8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=xS0XDYZJ; arc=none smtp.client-ip=209.85.166.46 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="xS0XDYZJ" Received: by mail-io1-f46.google.com with SMTP id ca18e2360f4ac-844e7409f8aso108501939f.1 for ; Mon, 03 Feb 2025 08:31:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738600285; x=1739205085; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=W4mLZ0JfeLifQ8U+aoK/TU8xkbtkVLx+Q+iBJc1yFPs=; b=xS0XDYZJGSQsc1jmdPJAujMEG29Ae0LJe6kTyP0wXMeX1DCf7UQ0kOb09f9L1Dh2W7 OfuzcbgvEtwEHKXE3Bymam7SStC3PV89f92xizEtv8JutTDGFujpIs8hr7FzmZiG5YKY TEHaUkqtuB6p6vrHufGWMGFrPmX/niFSrc0uQTwipB08mtJMxEx+f2WQItDqy0NQu6xl 1FH+zF7AWO7jArUQ/YUN/TEpBvPCv/NMlZY9fukVLdzFZJWoToQEKczNxWWxOjgEFGNt p31nj7EVGjWhhsf+mD/aFLLbOJlBXlOYfPDaT/rW0pRXSwUJOoY4nz9jq4H2Owscaj6w MTCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738600285; x=1739205085; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W4mLZ0JfeLifQ8U+aoK/TU8xkbtkVLx+Q+iBJc1yFPs=; b=kPcQ2JcKlU6LGXDlHf8FQruQ060QWIAPPhDREDF1I/IX5EvRkV4QFg/SDy+kdgBsr4 Y6QqAD/N8jsOEjyWINuLFF3XieqsuusjivzoVQI1ldydd/Rmber2UC7sOFWG7FJ12eON eN0J5ZGpTG/i3/o4NxlS2ZqpnFulK8fSs+R8hS4sRq0SdZxpiR8zTwe4UMbfpkn3yANK BidpIs/06Hg/Bk/5JKvoaTDUvD4XoIx7PweH6NZvryQ4YWyFLYbYjj4KHxgop4Mr09UV WIKKk2heScuk97ibVNEdJ4ncWi8v4VvTkgEW56pTTH9MH8JRvgAVM1pQcV0M6bN59TFt g7FA== X-Gm-Message-State: AOJu0YyyiO+b8H2vVgT1qkn55bvzbJ/lIbEXw9ipXhPFbacmsdN3lwYj Qw1yCbYvv+wnHEztRTCUahlPb58BNk2X2Ki+HI+XYshziifOdkLX84nOXG/MrWVnSaSqV37kZFG pIxI= X-Gm-Gg: ASbGnctYXz8+I2TCvE9f/ZemEp3TpDMy/VXvcLZ0aSxfQ/oHGU3iC10/m77OCMqCsGB nRVMidK79I38UMiBuULOdxpUp1hPzTEkDps6ZqQquI/BKkM3NB7yEri4uq3ysYWyc6gS+Vs5/ff FN8kXRZhXK+yu2PE5+U33pX4N9mpK54PSXH1qN9xzLVc0FS4cibBEsqkdmjj+fnhbakJOP+UsEi nXPnsT/rgu6/gM6tKW6TG0yFw5dHMaP3Ja+Ubw8YmReczS4MFfdRqIRvmy+CTSY31x89KKcWpX5 NBytYKF+hHtYyjbdCzw= X-Google-Smtp-Source: AGHT+IETQZ/yuBJk5CL2/5U1LafOvbuk9WK74Yfijq5HcWNEsU4+WLjYmeWSufJ5PT3mU+sEDNWO3Q== X-Received: by 2002:a05:6602:4183:b0:84a:5201:41ff with SMTP id ca18e2360f4ac-85427de977emr2164108339f.3.1738600284728; Mon, 03 Feb 2025 08:31:24 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-854a16123c6sm243748139f.24.2025.02.03.08.31.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 08:31:23 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 3/9] eventpoll: abstract out ep_try_send_events() helper Date: Mon, 3 Feb 2025 09:23:41 -0700 Message-ID: <20250203163114.124077-4-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250203163114.124077-1-axboe@kernel.dk> References: <20250203163114.124077-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for reusing this helper in another epoll setup helper, abstract it out. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 01edbee5c766..3cbd290503c7 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2016,6 +2016,22 @@ int epoll_wait_remove(struct file *file, struct wait_queue_entry *wait) return -EINVAL; } +static int ep_try_send_events(struct eventpoll *ep, + struct epoll_event __user *events, int maxevents) +{ + int res; + + /* + * Try to transfer events to user space. In case we get 0 events and + * there's still timeout left over, we go trying again in search of + * more luck. + */ + res = ep_send_events(ep, events, maxevents); + if (res > 0) + ep_suspend_napi_irqs(ep); + return res; +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -2067,17 +2083,9 @@ static int ep_poll(struct eventpoll *ep, struct epoll_event __user *events, while (1) { if (eavail) { - /* - * Try to transfer events to user space. In case we get - * 0 events and there's still timeout left over, we go - * trying again in search of more luck. - */ - res = ep_send_events(ep, events, maxevents); - if (res) { - if (res > 0) - ep_suspend_napi_irqs(ep); + res = ep_try_send_events(ep, events, maxevents); + if (res) return res; - } } if (timed_out) From patchwork Mon Feb 3 16:23:42 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13957803 Received: from mail-io1-f43.google.com (mail-io1-f43.google.com [209.85.166.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2FF4220B203 for ; Mon, 3 Feb 2025 16:31:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600288; cv=none; b=bQqrJZWU9Qtzth0ZCq9eMJttUOdoYuTELsnCOyy4H3D8rEXTw857H7lg+voMsYFFp2A0qI9bRNbxVFaX915KwN1fPirj/cHLk0bvWwqE5D2SZZHiSn4a+VLbGDX3pZgFOcppvry6MyLLm9tZghUB7SNvHiChxQXn+nqwhMlylGI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600288; c=relaxed/simple; bh=Lq/TvPbkX+M9fh/ofSu3jq3ZsdO0u7WjpsDGycNAOmQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T3UZmZpp0Pw3YZGwb9O1nMdJOtxrJ2QpOkLJssECnPnEFoKurFa4espCRgJEJR33YvAKOtoLt9cvt06T36G0V95CXSp0kVjLAAJHEg8A+dvQw064WetiI0xOhtyJvRB0Ea2HdixaSvIw4Us7w3tbSM8tb/Omf62Swqy5oOyJwWo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=jrRD5HS6; arc=none smtp.client-ip=209.85.166.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="jrRD5HS6" Received: by mail-io1-f43.google.com with SMTP id ca18e2360f4ac-84cdb6fba9bso366792139f.2 for ; Mon, 03 Feb 2025 08:31:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738600286; x=1739205086; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=enUf4uTUQbLHbQ4xi6ZoCFVbZFlA1oPnBMw51mV/SrU=; b=jrRD5HS6ruoBQr6xrBFCc2xTCKlY30qqKQnFsx331WjgeN6lVA/6GLIteTbpQzaAow 87fQ85L/EgsPXg1hfVcblsMPMup1JNYQ46DwzxEpM1ByGSHEogjQ64lxC5jtJdPDHkAp 2iD5LbU3nr31spW0WBkis7/DnXLz9D/D7ygLRJVcWWpboJBKKDcHU+0vSw/kLqFaTZgD 9Kp6NbeSYs8MYNIhz14Q4qfiBogUgv9Iiw2rfuPxZ8UQTYVtEE2nAN2zDJ7bSm2tlJcN rPI9ENrhRXRKx585D7sz4g/ZHunmPKLCClIKTj/EyF3L8Z1Jl9sQsZFdM7AXKQtM7nZs 9lsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738600286; x=1739205086; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=enUf4uTUQbLHbQ4xi6ZoCFVbZFlA1oPnBMw51mV/SrU=; b=C4ICHKWsYTWj7T3yt6o/H6qOYFTzvJ+OoXS4/KTVwviEmA2FZ3b7nyItYc07XPTBLb ZtkB9kCf9UNZAjx3PuZCQqPliFJfv8gZsiizU7SwbTErS0uQ8KXku1jmhfNpjngz4c9H OT3yb4/XFxh2x/Vfl1A/LX+Q2e7GCC7ScdR8kD3bsjClH5NXwZOIylDydoyGYw8NleL3 RhEv90TsS3pu182TPudq1CGPElRpHnAM+L7HYM4tvNYzqeI2nKye+On1Mz/KTl34CiPI faSMtYO9jjtDWEdYGil+ia5eOSC+/BAJLUAZDT/hlmAcpd/afTB0r+sRIbbaUv/fVWBb yHqQ== X-Gm-Message-State: AOJu0YwyYBxnPwCE+7FvZZpgbGkfA/CpbtLe4pJwwQuty7o/8KCAe/ik t+YdjFescq6Pz15ZDiLqmCAwx0cd5nNvBHiovh7pjjeMw641VuAvdSj+77z2CPv6//vkH5M39tR +Gwk= X-Gm-Gg: ASbGncv3HWp48c8uFN1ac4IYhNh5MZJcwVN7M9sraKJai8/9L3T6t/bkfVoRIErOByw W0x35Wyn1Vjuc1I1LmiadWFfSe/4aQoSUCxePylwMszYkeZC88dXU70lskSZqysfymg4tmgp0eW 583m1aPjCiN0j21ux93XQnF3zTO/S3T2+g6eLwIpMqcdbjK3egVpkme6WO+Zq5UnOhTJvQLirbK ug8YVMfOGmlpG9xqy8SiWHi8lbW3k7WwrLCkAH2MX7g25+RswYjR+YYxkaSXuFcxz6KQIGxweLP UgVVqzYjbM2/VaCygVA= X-Google-Smtp-Source: AGHT+IEN8YdHjbZSshkK/29Wg0w9zSZQjrmgFR62n7cchB5dhcuaotu1SUwqJVpYktG/j/+fEnd5+Q== X-Received: by 2002:a05:6602:4019:b0:82c:e4e1:2e99 with SMTP id ca18e2360f4ac-85439fa26e6mr2051968639f.11.1738600285975; Mon, 03 Feb 2025 08:31:25 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-854a16123c6sm243748139f.24.2025.02.03.08.31.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 08:31:25 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 4/9] eventpoll: add struct wait_queue_entry argument to epoll_wait() Date: Mon, 3 Feb 2025 09:23:42 -0700 Message-ID: <20250203163114.124077-5-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250203163114.124077-1-axboe@kernel.dk> References: <20250203163114.124077-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for allowing an outside caller to add itself to the epoll waitqueue, pass in a struct wait_queue_entry. Unused in its current form, but will be utilized shortly. No intended functional changes in this patch. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 5 +++-- include/linux/eventpoll.h | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 3cbd290503c7..ecaa5591f4be 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2470,7 +2470,8 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd, } int epoll_wait(struct file *file, struct epoll_event __user *events, - int maxevents, struct timespec64 *to) + int maxevents, struct timespec64 *to, + struct wait_queue_entry *wait) { struct eventpoll *ep; @@ -2509,7 +2510,7 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events, /* Get the "struct file *" for the eventpoll file */ CLASS(fd, f)(epfd); if (!fd_empty(f)) - return epoll_wait(fd_file(f), events, maxevents, to); + return epoll_wait(fd_file(f), events, maxevents, to, NULL); return -EBADF; } diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h index 1301fc74aca0..24f9344df5a3 100644 --- a/include/linux/eventpoll.h +++ b/include/linux/eventpoll.h @@ -27,7 +27,8 @@ void eventpoll_release_file(struct file *file); /* Use to reap events */ int epoll_wait(struct file *file, struct epoll_event __user *events, - int maxevents, struct timespec64 *to); + int maxevents, struct timespec64 *to, + struct wait_queue_entry *wait); /* Remove wait entry */ int epoll_wait_remove(struct file *file, struct wait_queue_entry *wait); From patchwork Mon Feb 3 16:23:43 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13957804 Received: from mail-io1-f42.google.com (mail-io1-f42.google.com [209.85.166.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 827872B9BB for ; Mon, 3 Feb 2025 16:31:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600290; cv=none; b=eODcLTIsUo7pCnw5J3Ju3nNyHlDBB3iBaR8cmQNRqofXm+xo9Yd1ew9C+9jVrUhr1Y+cnLxIV5cVy5ncyg8ZqqFqEtFuye4XrI2LKsJX6UYh7T3JiId+MND4Q6UnF1Rwcj2727FM25LZV1kiSySH+UlXaGQjUAswJvAiiTFX4wo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600290; c=relaxed/simple; bh=ayD177cZGh9bhC4WV7iQTen+4xZyD0WE3g92TyrwJnY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gusA1txyPbtE9daZoIWf/KEQ1gcpWVZBPe3Ms/6oGIy+PO+qByBDLbwyh6lkwH5NBNUSufuI6rFbq3bEeSNRROza9AfLRIdjvgdwpDpYA7lcPVNxNgKLbhOUWXp19s1pJMSbWSLHxGsTPkXF/8AIpo8t7xsrJjBjrbpF3Ng6bl0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=f6v5Hc7O; arc=none smtp.client-ip=209.85.166.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="f6v5Hc7O" Received: by mail-io1-f42.google.com with SMTP id ca18e2360f4ac-844ee43460aso279553339f.1 for ; Mon, 03 Feb 2025 08:31:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738600287; x=1739205087; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mCWW/DHnmpnY+C0n2fbNEGbgNNkUYylpMG8Oip3AQsg=; b=f6v5Hc7O/DXa4nmyM4n9ODsNvkNmUs/Z5Z0a1h6WjIHB/SpnDvF2x6BgHefgbwPHz5 7BPeKgaY89G6mrhzkF9Y+8O+qIL4Ud3HbblcnQWJ41SiG4DJJ3XxvuSxFm9+rcDBluqp iuZbCqidgWbM5/Srrya4xNe8gFOTK9e1k6CRlIyLrz4OSGOGupvAG7KIaLWSf4OTtaaf tdDN2t+OWo3jMjcYQWaT+H8Byvzp+2nUA0Lp1Q92jNsqBxhDO9TH5umgOCgvTJCsDNly biwBQoP931WR8/1+CIosOADUOANCE+ddm61Qsb0Z23jePnuBW5tB/B2GISErvk+D3+Jq X3vQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738600287; x=1739205087; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mCWW/DHnmpnY+C0n2fbNEGbgNNkUYylpMG8Oip3AQsg=; b=tyKKEr4oOTfXc6sINnMwm+RzrPyKibkHlxhBzx2cSQiTOb6nyBZ5gdc2NkLK65GC0z tFulaAfs6E6VWvOljflgOl4uqf8ZGdilpM1HH9voAKfNYmI9EGMrwWHngbIlouWjVdvm i/os4a3DicVfJHmjbf0v0XkP2susmZUvWQSWFxtiGOFB1RwgCgRkZykx7preaLUjZv3J UUF7AfUtfcD0k24qkwxWygP7DuCJQOELL6QYfECzLCfzmHmylemWDYYpxpEMmmK9zPQs ZaOjoZ2QCDr216JdV6NB/TssnnnGCl9AQF9VdUhXQeVFf3ACuz0TeYJSucq5jeMzAFWs fOiQ== X-Gm-Message-State: AOJu0YxsHkenT2hy7Ru6daEn/p9mRqvnATz7+7OqHl/67o9Z8T33mvkm RbvmTyP3Fm1z+7WlOyyF1Et9XzPWa/28ZlRG4lDh8axRs1zYhfifcF5SL0BEAIbBjnYY8kBlj2E xNkY= X-Gm-Gg: ASbGncs0ahG2F23atJAS/68pMc3SWkyQ02MgPEu35PqjcoWZt3jtEJS0C8lTIIUJn9x tbYaOycn9fNvsJjV8mWhtog5amikyNumgg/xa8vlp1984oO/18R3Z1dyVoRnTZPMOhJWXbZ40Ru dbwiWhkoXAQV/C3e0VVmG5cLUwH3Pf7d7JvOy0WVJPIcBZnpujfCjBnhOnEna4BObMplvtC91m8 rat2zAXyQl7KIIBXb1UsqCu/W/1CV6GwSCQzkUwVJDAm/BAGUjSQDNVTMqUt9uQq9Z/WSGV2HEH PztFydBCCWmXP0VYX7w= X-Google-Smtp-Source: AGHT+IFody0BekAT6IYbYLW3U7jHo0YoJOdClv9dRYiYiWYEAJytUXMyrnGF6J0B0Z1zwildYJdr0Q== X-Received: by 2002:a05:6602:4806:b0:84f:44de:9c99 with SMTP id ca18e2360f4ac-85427e00ab5mr2210157139f.5.1738600287116; Mon, 03 Feb 2025 08:31:27 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-854a16123c6sm243748139f.24.2025.02.03.08.31.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 08:31:26 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 5/9] eventpoll: add ep_poll_queue() loop Date: Mon, 3 Feb 2025 09:23:43 -0700 Message-ID: <20250203163114.124077-6-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250203163114.124077-1-axboe@kernel.dk> References: <20250203163114.124077-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 If a wait_queue_entry is passed in to epoll_wait(), then utilize this new helper for reaping events and/or adding to the epoll waitqueue rather than calling the potentially sleeping ep_poll(). It works like ep_poll(), except it doesn't block - it either returns the events that are already available, or it adds the specified entry to the struct eventpoll waitqueue to get a callback when events are triggered. It returns -EIOCBQUEUED for that case. Signed-off-by: Jens Axboe --- fs/eventpoll.c | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) diff --git a/fs/eventpoll.c b/fs/eventpoll.c index ecaa5591f4be..a8be0c7110e4 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -2032,6 +2032,39 @@ static int ep_try_send_events(struct eventpoll *ep, return res; } +static int ep_poll_queue(struct eventpoll *ep, + struct epoll_event __user *events, int maxevents, + struct wait_queue_entry *wait) +{ + int res, eavail; + + /* See ep_poll() for commentary */ + eavail = ep_events_available(ep); + while (1) { + if (eavail) { + res = ep_try_send_events(ep, events, maxevents); + if (res) + return res; + } + + eavail = ep_busy_loop(ep, true); + if (eavail) + continue; + + if (!list_empty_careful(&wait->entry)) + return -EIOCBQUEUED; + + write_lock_irq(&ep->lock); + eavail = ep_events_available(ep); + if (!eavail) + __add_wait_queue_exclusive(&ep->wq, wait); + write_unlock_irq(&ep->lock); + + if (!eavail) + return -EIOCBQUEUED; + } +} + /** * ep_poll - Retrieves ready events, and delivers them to the caller-supplied * event buffer. @@ -2497,7 +2530,9 @@ int epoll_wait(struct file *file, struct epoll_event __user *events, ep = file->private_data; /* Time to fish for events ... */ - return ep_poll(ep, events, maxevents, to); + if (!wait) + return ep_poll(ep, events, maxevents, to); + return ep_poll_queue(ep, events, maxevents, wait); } /* From patchwork Mon Feb 3 16:23:44 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13957805 Received: from mail-io1-f52.google.com (mail-io1-f52.google.com [209.85.166.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C656E20B1E6 for ; Mon, 3 Feb 2025 16:31:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.52 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600291; cv=none; b=XeV0aFXAWBXWeWUHIAKPE2UFzP0P+5EfLXwBb6lik0rHKvfgCgluJYEZUJnERXa2wti7qiD5bEU1IYpKkVtupBn8xtWpgEJhSPFnBA2uMlEn702Mq0tVCTfpOULE+ydxqbeKnyO3saxyteC7S8sDw80+cfPNnZBuhnSpp+2nyrQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600291; c=relaxed/simple; bh=AJ4EpG+a/HnjEaQKeFR8lZf7Ex4++mtSxvprrt1iy8A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=G+/MdBkSOi8jd9B4jJGHwayPo+GEMjLFlp7Qm9XEJJg9Ve1zBSIrBpbvcB/QNP6K3hte+toGPNFj2fVFTXcF7oGLwyaT8XNrJDoKQldwqJi3yDggc+OuGvThaYFzPuNAv4JmF+SYAYa+cSYURrIJsqUuF6z8wr6UgW9fO/1kIAo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=ZJoRKj/w; arc=none smtp.client-ip=209.85.166.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="ZJoRKj/w" Received: by mail-io1-f52.google.com with SMTP id ca18e2360f4ac-84ceaf2667aso298716139f.3 for ; Mon, 03 Feb 2025 08:31:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738600288; x=1739205088; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Yl4I89AW9N+9ZjPLUSwNXzZLneK6YdFoM+i0fKJ0Giw=; b=ZJoRKj/wU9yHgIyliRh6uYcJDLsHUg87VbXXz9WrMywx571IQHJSZvPWT+pehE6zwt XPJVDW+HJuvBUCcEY3dulA5y+us00W5xVqfja+guAPugN+752p/tYE64d3ZYUFaoNzqo bB/O+lZGQ/7RHcdgwEE/LA9ngEG1mXOl+SEJ7UhavRIv+I4huyEKASbtQ7wzEZ9DtQGA kIVZ/Q9FtD3miB/DDvoOctkkWrErtnhHBAOgDZPauz+AEJpcmxEqAKxZOV8XDQzVLGPl 6wCxQv8IGNEUmY+/173HHd8Gac4cbefpo4iWkLJhIEPwk2m+AEO2WJ88Jtr7KOxlbBky /Rew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738600288; x=1739205088; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Yl4I89AW9N+9ZjPLUSwNXzZLneK6YdFoM+i0fKJ0Giw=; b=VKrDvGM3mW5M76VrnU8Nv+2AYybFzZxYkwVKbgakcbiMcMUHzdrWAWk2P4hc3bzT2g bBA6jPFMNNKVSIe7euWidlHSHh9r9Y6rlLyAAjE4Rcv1Op5r3JbF72NngdTjPoEhTz15 XF10kaKmX5ISPMflnRC/uocsAWyHo0LEoEdIALmSOVnpja5IAbpbigQRirDfCsV+ypsP hCAhQTOx0hbHg5bfnIpi1D7D7VMBFBh5XXGtVgMmt9uotG4sDgEC1E7OLVgWWGaZNpfn 52Iu3OB3vlsBj5liX8g9Cozvut1AzHrYFQT8QsAkiIePTWEne5Gv1w4cEY4rI46dVTvC 4XSw== X-Gm-Message-State: AOJu0YwBy8Ov6K9Btf517qZtHSNkEPObj6ApmR7S8y9lPnulUWm0Jy+Z c7uFHuxb/zIYOMqiOrlmcd2BVzCto6gbawpI2sUMOEt2MDZkblzLWXS3BWX7x4yZP9aIlaKLY4D iAKI= X-Gm-Gg: ASbGnctK8nNYLI5ARYysV/o5GW8nLAiJ8qKHv6GyPYbmPa/0+dpreF1w/jnflBImJgJ Tdq9B0QSsMzfTy/CmyPB4OF1/OP9ubAgRho9gt9XQzYQGVKCedYcUKtl1yPcaQaosk5oMXqPySh +y+jh3Ca/2U/6tm/kTSfZ6czsEysmxV+8yNU0MLbdI16e/lB0uarWZy6u7T1bZ9BPI+ZIK2kfyk il9j/RpPQG+76r1Z+yYneYyntUOCM+533wzB92r/9qAg+RuP6QLWsDojBo5uKfed7aCFRP/LOdq EsEntcDMulWLFWTZrCw= X-Google-Smtp-Source: AGHT+IE2WgIanOs+ot0lC6WutrqrC2QvvTTTNdZSFULo/ROZk+MFEGYpcigt6bnj4gvbuMhY+oEQZA== X-Received: by 2002:a05:6602:3818:b0:84f:41d9:9932 with SMTP id ca18e2360f4ac-85427df1edbmr2081256039f.9.1738600288562; Mon, 03 Feb 2025 08:31:28 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-854a16123c6sm243748139f.24.2025.02.03.08.31.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 08:31:27 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 6/9] io_uring/epoll: remove CONFIG_EPOLL guards Date: Mon, 3 Feb 2025 09:23:44 -0700 Message-ID: <20250203163114.124077-7-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250203163114.124077-1-axboe@kernel.dk> References: <20250203163114.124077-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Just have the Makefile add the object if epoll is enabled, then it's not necessary to guard the entire epoll.c file inside an CONFIG_EPOLL ifdef. Signed-off-by: Jens Axboe --- io_uring/Makefile | 9 +++++---- io_uring/epoll.c | 2 -- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/io_uring/Makefile b/io_uring/Makefile index d695b60dba4f..7114a6dbd439 100644 --- a/io_uring/Makefile +++ b/io_uring/Makefile @@ -11,9 +11,10 @@ obj-$(CONFIG_IO_URING) += io_uring.o opdef.o kbuf.o rsrc.o notif.o \ eventfd.o uring_cmd.o openclose.o \ sqpoll.o xattr.o nop.o fs.o splice.o \ sync.o msg_ring.o advise.o openclose.o \ - epoll.o statx.o timeout.o fdinfo.o \ - cancel.o waitid.o register.o \ - truncate.o memmap.o alloc_cache.o + statx.o timeout.o fdinfo.o cancel.o \ + waitid.o register.o truncate.o \ + memmap.o alloc_cache.o obj-$(CONFIG_IO_WQ) += io-wq.o obj-$(CONFIG_FUTEX) += futex.o -obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o +obj-$(CONFIG_EPOLL) += epoll.o +obj-$(CONFIG_NET_RX_BUSY_POLL) += napi.o diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 89bff2068a19..7848d9cc073d 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -12,7 +12,6 @@ #include "io_uring.h" #include "epoll.h" -#if defined(CONFIG_EPOLL) struct io_epoll { struct file *file; int epfd; @@ -58,4 +57,3 @@ int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags) io_req_set_res(req, ret, 0); return IOU_OK; } -#endif From patchwork Mon Feb 3 16:23:45 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13957806 Received: from mail-io1-f53.google.com (mail-io1-f53.google.com [209.85.166.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3227F20B7EC for ; Mon, 3 Feb 2025 16:31:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600292; cv=none; b=gt5tU0uRvUVpQpYy9PzCQ1RBS/oMMdbM+WOuqewuhxD1EIJBtOlpi1xEYpL9+hDg7tzNKh8qrJ90Mejq1stcsymqOXghTF8ePOakXPuksYbXOlY3gHmZzXqiwbLIlmAf/TNgQO+8XMUMss1RV1UFTxm0duFqLggIMKBm4uDSP/c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600292; c=relaxed/simple; bh=im1ntOsYLHYjJiEvx6njeC6vx3wirWuYtKuqP171Gs4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cxIpJGUKmXmTT1SFQbrqDkDjtB5LI9sG9aUEco00pVpYsI5naihjukzCwTxV9Pk3DYtsqKW61uRJxa9e3WTQgZd4x3wA7Q1T71HojC2cH9T6vvchxMp6C8Wxo+ocwQyxYdM392VWiKVqZmRC2dyjASXcfglNjpCUFaQia+N771w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=2WV48dbb; arc=none smtp.client-ip=209.85.166.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="2WV48dbb" Received: by mail-io1-f53.google.com with SMTP id ca18e2360f4ac-84a1ce51187so125330139f.1 for ; Mon, 03 Feb 2025 08:31:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738600290; x=1739205090; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TQ8sKu9TA7JXcFTCTxuSEZOOG4qI+mYE1gZPeZ19VjI=; b=2WV48dbbfKhGaSERDCCzZF60fGOeZyC03tkpBZK7RgjcKch27ArVCY3GVA5Hu0IZ7g GDdZd6C329mlV4MdXKJfiBRbRBuEqtEYwLsg2IjIm93/REwy3wUC0xyFaJf+DOf0cyfu pVFdvT9HYWKiGYhc17JYskY7kHZw+QcSAhe+Puw0QXemx67vG6DXEM/D02DlmUDWT518 qCiU+Uvrr2rIpS5gD8t5MRG8HqPSPebin5aphGYbjJRN5mVlRK94b9/dhxb3UEqgQnvQ PlBmhaXEfHsQDxRLErCo8KC4JsF/vIVG1FcjDm73jyFUJgwaRdzzpJ0s6j97yDUvSxeR SIXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738600290; x=1739205090; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TQ8sKu9TA7JXcFTCTxuSEZOOG4qI+mYE1gZPeZ19VjI=; b=WPcKrdRWUdk4OQvPZjNAJAxh4H7hkukdum8v/jrbXJGa5/49Ygww0vD/GRs5yQmyi0 qJwlPpT0sRVNIRSUgs3Ko5scydi6Y9+GO8jIj7pYpc8PnZAH1cHFq56OwBJnANDEzwMH RdSYv89T0SGi4OHPlNExZOgQYOwukagWdT4s1fODmCtCUy7Q+b8yOqM0NIqJUOFy3aLM Ts3l9Emq0CFhKw/BqddfQ/YuwtSZ6afxZeG5fdqRvBa0ximYrTlK8tqrmm//IpNy0L2M wK+9lFo+fjEF1OsNnJHKLUGObdwrxXNrNsIGArrSis1rzRXEHw6vwaUMv9LRKRykLLuO kqVw== X-Gm-Message-State: AOJu0YwadlIqANEE0FPwtlIx7Huucfk3xoJN9oNMeOr8b3OwqF5ws+zb OIUQPnSXLYk/Z/YIwRPczoI4M6NmwW+TokZ1q4qunMbFNE6r8YKAPCo3anqb9jtRYDvL5f2yH8E l2CM= X-Gm-Gg: ASbGncsfnDCvkCaUGm8+roB0CpRvdvW0kv3JglccWl1o8RumpGWsMOZkbMcR3gyIy0b dunOmSIGAgHG792wDw4nqgyYdyyfUEzpLbo4Fkw0ADid/w5Nf1xH83JW8ez/rHVi6U1pvc3COmp CHtgHWrWxNuiYP4ozdb2q773KIyNJbeyvGlZgblQj1bZUvLlhN7oh+3daNYP7oHDJ/JUXNxj09X 2kd7iPuXSM5VsFQnDVopi0PzR7rtt8B2IZJKtAoSNx5N6MPr/3QOk72K4uwRs/rxGX5/Xyl/ElT 9gQOhmnCWuhH3fsUfs4= X-Google-Smtp-Source: AGHT+IHbguIeZfHTdu65ziKGW6ATBjS2yiHfTO/r9nckYu6FLpo07OrSWeS6IHMC5zGBM2+H37rscA== X-Received: by 2002:a05:6602:394e:b0:83a:a305:d9ee with SMTP id ca18e2360f4ac-85439fbbbf9mr2298125539f.12.1738600289812; Mon, 03 Feb 2025 08:31:29 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-854a16123c6sm243748139f.24.2025.02.03.08.31.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 08:31:28 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 7/9] io_uring/poll: pull ownership handling into poll.h Date: Mon, 3 Feb 2025 09:23:45 -0700 Message-ID: <20250203163114.124077-8-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250203163114.124077-1-axboe@kernel.dk> References: <20250203163114.124077-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 In preparation for using it from somewhere else. Rather than try and duplicate the functionality, just make it generically available to io_uring opcodes. Note: would have to be used carefully, cannot be used by opcodes that can trigger poll logic. Signed-off-by: Jens Axboe --- io_uring/poll.c | 30 +----------------------------- io_uring/poll.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 32 insertions(+), 29 deletions(-) diff --git a/io_uring/poll.c b/io_uring/poll.c index bb1c0cd4f809..5e44ac562491 100644 --- a/io_uring/poll.c +++ b/io_uring/poll.c @@ -41,16 +41,6 @@ struct io_poll_table { __poll_t result_mask; }; -#define IO_POLL_CANCEL_FLAG BIT(31) -#define IO_POLL_RETRY_FLAG BIT(30) -#define IO_POLL_REF_MASK GENMASK(29, 0) - -/* - * We usually have 1-2 refs taken, 128 is more than enough and we want to - * maximise the margin between this amount and the moment when it overflows. - */ -#define IO_POLL_REF_BIAS 128 - #define IO_WQE_F_DOUBLE 1 static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync, @@ -70,7 +60,7 @@ static inline bool wqe_is_double(struct wait_queue_entry *wqe) return priv & IO_WQE_F_DOUBLE; } -static bool io_poll_get_ownership_slowpath(struct io_kiocb *req) +bool io_poll_get_ownership_slowpath(struct io_kiocb *req) { int v; @@ -85,24 +75,6 @@ static bool io_poll_get_ownership_slowpath(struct io_kiocb *req) return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); } -/* - * If refs part of ->poll_refs (see IO_POLL_REF_MASK) is 0, it's free. We can - * bump it and acquire ownership. It's disallowed to modify requests while not - * owning it, that prevents from races for enqueueing task_work's and b/w - * arming poll and wakeups. - */ -static inline bool io_poll_get_ownership(struct io_kiocb *req) -{ - if (unlikely(atomic_read(&req->poll_refs) >= IO_POLL_REF_BIAS)) - return io_poll_get_ownership_slowpath(req); - return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); -} - -static void io_poll_mark_cancelled(struct io_kiocb *req) -{ - atomic_or(IO_POLL_CANCEL_FLAG, &req->poll_refs); -} - static struct io_poll *io_poll_get_double(struct io_kiocb *req) { /* pure poll stashes this in ->async_data, poll driven retry elsewhere */ diff --git a/io_uring/poll.h b/io_uring/poll.h index 04ede93113dc..2f416cd3be13 100644 --- a/io_uring/poll.h +++ b/io_uring/poll.h @@ -21,6 +21,18 @@ struct async_poll { struct io_poll *double_poll; }; +#define IO_POLL_CANCEL_FLAG BIT(31) +#define IO_POLL_RETRY_FLAG BIT(30) +#define IO_POLL_REF_MASK GENMASK(29, 0) + +bool io_poll_get_ownership_slowpath(struct io_kiocb *req); + +/* + * We usually have 1-2 refs taken, 128 is more than enough and we want to + * maximise the margin between this amount and the moment when it overflows. + */ +#define IO_POLL_REF_BIAS 128 + /* * Must only be called inside issue_flags & IO_URING_F_MULTISHOT, or * potentially other cases where we already "own" this poll request. @@ -30,6 +42,25 @@ static inline void io_poll_multishot_retry(struct io_kiocb *req) atomic_inc(&req->poll_refs); } +/* + * If refs part of ->poll_refs (see IO_POLL_REF_MASK) is 0, it's free. We can + * bump it and acquire ownership. It's disallowed to modify requests while not + * owning it, that prevents from races for enqueueing task_work's and b/w + * arming poll and wakeups. + */ +static inline bool io_poll_get_ownership(struct io_kiocb *req) +{ + if (unlikely(atomic_read(&req->poll_refs) >= IO_POLL_REF_BIAS)) + return io_poll_get_ownership_slowpath(req); + return !(atomic_fetch_inc(&req->poll_refs) & IO_POLL_REF_MASK); +} + +static inline void io_poll_mark_cancelled(struct io_kiocb *req) +{ + atomic_or(IO_POLL_CANCEL_FLAG, &req->poll_refs); +} + + int io_poll_add_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_poll_add(struct io_kiocb *req, unsigned int issue_flags); From patchwork Mon Feb 3 16:23:46 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13957807 Received: from mail-io1-f53.google.com (mail-io1-f53.google.com [209.85.166.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C63A620B803 for ; Mon, 3 Feb 2025 16:31:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.53 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600294; cv=none; b=AbicXWdEKXuZp7mzMlzQlB1wmPY1q7CuOu/MRxMQW7xBASYTsTyZxMxj1OT6XLr1LFdQLkIbS8A8Q864Ykre96PiYKT3SXhTgO1m2qMDzENnGHf88aVqh1Pce1wSe7fjopXvDYFUL9+UoBfXaZdZ53MLbBIjRMhtLblfkI9ALzw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600294; c=relaxed/simple; bh=CMTEnTPWpb05BVNxeVZlHIui6WHm7CkjMUZktWehDB0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hdQ6sUKvwpnvaNnBTvaFZPqUnKYKYVbFfNL31y/F9YnfB84eKIxLoki3rRm9YRSHse8Oraz4Ns4FYzqOdg0oOX/smoMhAtK4v9+mdioifsnnyhTXSjz3RRECEWGiZ6NmE6fHLqqlw/jluXVvzw4wa7LoBJuRduN4+J6o0eXyx3g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=Qn/rhjzR; arc=none smtp.client-ip=209.85.166.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="Qn/rhjzR" Received: by mail-io1-f53.google.com with SMTP id ca18e2360f4ac-844e10ef3cfso328011039f.2 for ; Mon, 03 Feb 2025 08:31:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738600291; x=1739205091; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=K+AXd57b1JzBRkx41X4hZ11sGTsVxTuoF3YRV66XCYw=; b=Qn/rhjzRs4X75LGahxyx+RRkNnVmbBGzs0MawPHukZ5PchEG2Lu5foDXVr3v2N+tb1 4ilnPdVFYN+ytlhqnsBo4ZCaxMUEzssHwNypWnvvnIKsbhO8ZsdxRqZIr9c3sYhVic0W iTh29BQlJjLBYak2TYtbJL6fwUxoHlTmfuSAiy1TNSyqlEiswjooN+C+EPkTCqkHaxW1 SLHwRMtgzKv+Vm1FEJY6VES7sL2OJZOH0n11H9V7eyyAwfzKXlCmK+USSa+SZEHQ6AG5 DB3MVKeP113Lh2g+fuUK0PHATbUlKaWqVIbGFszky8S1dg0/diAVr5/fC1rASOLduISI 62Rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738600291; x=1739205091; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=K+AXd57b1JzBRkx41X4hZ11sGTsVxTuoF3YRV66XCYw=; b=Gicka/0mBCe2YuwSLtxNgQ2CJM6Zr8B5ms5QEoA3iuGtrY2DVPMRB65gHyZMCnUUrm TTFvOretYoX531CJ+7NLkE2BbjS+cZTfDYnRuUWg3wnQn693Y22epjrnqHtodx3A0jKx wLF5yGM6ofiO7cAwLy1BdZg0yeTWvlr5FYMcJGyQMEijsh+VYDosSNIyz86M7WlLnOfv EUnO7UFig8tsomtakIFgMzbwjWN7i9H+cMuBHz1wCPlhXCssjYiW1HrOw7qNaiSnozMs tFzxO6QOfbouz/SgVwntdCIp/uN81HjLCbDRbBfi1fGpb9ekmxXQegiWOOYQ/6XFSEEl 4VAQ== X-Gm-Message-State: AOJu0Yzd+1tcDJ0ynxiEEQglCvFmHE8jSgt9flAvLOZXZgJHomkLo0pf /+sbAJEzaePc1GA4mIwBqPr58qA2ELEF+pYV1vRNaADbaL/s+J5MFZcI2WnTgNY3ZhtF8lv/ohl Rkac= X-Gm-Gg: ASbGncuJJGSiLhs5CSUvPgfbTnSZxrCWe1BmYXFP3OZms3DmbPNbLBOOJcZcXZLhtb9 43qUWI8wBYWREg2nsBDZMGtjvDOh5AZWtFb916ZfWrZtXNdNY3Gco2aKrJukoP9P2+C1WQtK0Vo HbhoblrEFY4AV9tQX2ohXsW03aUt0n1NcjRdcZf1R4rU712Cn3XR8LN0lkxxZoldsj6ZbFTg7rW rS8aTL+/cy2J7n3V0Rq3r5wX+We6vH9GC9vCYLQYoFqA7CG+CTslS4yJ/Wmx47LWaToSsZDfqNs Az3eXgktQxA3u6lGux8= X-Google-Smtp-Source: AGHT+IF07yJPQjHqGkR2Cwji/yqUbjc61jgaBnPgFRqP63HHV9IkZQsnLBjH7Osln4jmhTHcVdSujQ== X-Received: by 2002:a05:6602:1696:b0:849:c82e:c084 with SMTP id ca18e2360f4ac-854111121f5mr2081287439f.6.1738600291215; Mon, 03 Feb 2025 08:31:31 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-854a16123c6sm243748139f.24.2025.02.03.08.31.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 08:31:30 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 8/9] io_uring/epoll: add support for IORING_OP_EPOLL_WAIT Date: Mon, 3 Feb 2025 09:23:46 -0700 Message-ID: <20250203163114.124077-9-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250203163114.124077-1-axboe@kernel.dk> References: <20250203163114.124077-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 For existing epoll event loops that can't fully convert to io_uring, the used approach is usually to add the io_uring fd to the epoll instance and use epoll_wait() to wait on both "legacy" and io_uring events. While this work, it isn't optimal as: 1) epoll_wait() is pretty limited in what it can do. It does not support partial reaping of events, or waiting on a batch of events. 2) When an io_uring ring is added to an epoll instance, it activates the io_uring "I'm being polled" logic which slows things down. Rather than use this approach, with EPOLL_WAIT support added to io_uring, event loops can use the normal io_uring wait logic for everything, as long as an epoll wait request has been armed with io_uring. Note that IORING_OP_EPOLL_WAIT does NOT take a timeout value, as this is an async request. Waiting on io_uring events in general has various timeout parameters, and those are the ones that should be used when waiting on any kind of request. If events are immediately available for reaping, then This opcode will return those immediately. If none are available, then it will post an async completion when they become available. cqe->res will contain either an error code (< 0 value) for a malformed request, invalid epoll instance, etc. It will return a positive result indicating how many events were reaped. IORING_OP_EPOLL_WAIT requests may be canceled using the normal io_uring cancelation infrastructure. The poll logic for managing ownership is adopted to guard the epoll side too. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 4 + include/uapi/linux/io_uring.h | 1 + io_uring/cancel.c | 5 + io_uring/epoll.c | 168 +++++++++++++++++++++++++++++++++ io_uring/epoll.h | 22 +++++ io_uring/io_uring.c | 5 + io_uring/opdef.c | 14 +++ 7 files changed, 219 insertions(+) diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 3def525a1da3..ee56992d31d5 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -370,6 +370,10 @@ struct io_ring_ctx { struct io_alloc_cache futex_cache; #endif +#ifdef CONFIG_EPOLL + struct hlist_head epoll_list; +#endif + const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index e11c82638527..a559e1e1544a 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -278,6 +278,7 @@ enum io_uring_op { IORING_OP_FTRUNCATE, IORING_OP_BIND, IORING_OP_LISTEN, + IORING_OP_EPOLL_WAIT, /* this goes last, obviously */ IORING_OP_LAST, diff --git a/io_uring/cancel.c b/io_uring/cancel.c index 484193567839..9cebd0145cb4 100644 --- a/io_uring/cancel.c +++ b/io_uring/cancel.c @@ -17,6 +17,7 @@ #include "timeout.h" #include "waitid.h" #include "futex.h" +#include "epoll.h" #include "cancel.h" struct io_cancel { @@ -128,6 +129,10 @@ int io_try_cancel(struct io_uring_task *tctx, struct io_cancel_data *cd, if (ret != -ENOENT) return ret; + ret = io_epoll_wait_cancel(ctx, cd, issue_flags); + if (ret != -ENOENT) + return ret; + spin_lock(&ctx->completion_lock); if (!(cd->flags & IORING_ASYNC_CANCEL_FD)) ret = io_timeout_cancel(ctx, cd); diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 7848d9cc073d..2a9c679516c8 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -11,6 +11,7 @@ #include "io_uring.h" #include "epoll.h" +#include "poll.h" struct io_epoll { struct file *file; @@ -20,6 +21,13 @@ struct io_epoll { struct epoll_event event; }; +struct io_epoll_wait { + struct file *file; + int maxevents; + struct epoll_event __user *events; + struct wait_queue_entry wait; +}; + int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_epoll *epoll = io_kiocb_to_cmd(req, struct io_epoll); @@ -57,3 +65,163 @@ int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags) io_req_set_res(req, ret, 0); return IOU_OK; } + +static void __io_epoll_cancel(struct io_kiocb *req) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + epoll_wait_remove(req->file, &iew->wait); + hlist_del_init(&req->hash_node); + io_req_set_res(req, -ECANCELED, 0); + req->io_task_work.func = io_req_task_complete; + io_req_task_work_add(req); +} + +static void __io_epoll_wait_cancel(struct io_kiocb *req) +{ + io_poll_mark_cancelled(req); + if (io_poll_get_ownership(req)) { + __io_epoll_cancel(req); + io_req_set_res(req, -ECANCELED, 0); + req->io_task_work.func = io_req_task_complete; + io_req_task_work_add(req); + } +} + +bool io_epoll_wait_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx, + bool cancel_all) +{ + struct hlist_node *tmp; + struct io_kiocb *req; + bool found = false; + + lockdep_assert_held(&ctx->uring_lock); + + hlist_for_each_entry_safe(req, tmp, &ctx->epoll_list, hash_node) { + if (!io_match_task_safe(req, tctx, cancel_all)) + continue; + __io_epoll_wait_cancel(req); + found = true; + } + + return found; +} + +int io_epoll_wait_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd, + unsigned int issue_flags) +{ + struct hlist_node *tmp; + struct io_kiocb *req; + int nr = 0; + + io_ring_submit_lock(ctx, issue_flags); + hlist_for_each_entry_safe(req, tmp, &ctx->epoll_list, hash_node) { + if (!io_cancel_req_match(req, cd)) + continue; + __io_epoll_wait_cancel(req); + nr++; + } + io_ring_submit_unlock(ctx, issue_flags); + return nr ?: -ENOENT; +} + +static void io_epoll_retry(struct io_kiocb *req, struct io_tw_state *ts) +{ + int v; + + do { + v = atomic_read(&req->poll_refs); + if (unlikely(v != 1)) { + if (WARN_ON_ONCE(!(v & IO_POLL_REF_MASK))) + return; + if (v & IO_POLL_CANCEL_FLAG) { + __io_epoll_cancel(req); + return; + } + } + v &= IO_POLL_REF_MASK; + } while (atomic_sub_return(v, &req->poll_refs) & IO_POLL_REF_MASK); + + io_req_task_submit(req, ts); +} + +static int io_epoll_execute(struct io_kiocb *req) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + if (io_poll_get_ownership(req)) { + list_del_init_careful(&iew->wait.entry); + req->io_task_work.func = io_epoll_retry; + io_req_task_work_add(req); + return 1; + } + + return 0; +} + +static __cold int io_epoll_pollfree_wake(struct io_kiocb *req) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + io_poll_mark_cancelled(req); + list_del_init_careful(&iew->wait.entry); + io_epoll_execute(req); + return 1; +} + +static int io_epoll_wait_fn(struct wait_queue_entry *wait, unsigned mode, + int sync, void *key) +{ + struct io_kiocb *req = wait->private; + __poll_t mask = key_to_poll(key); + + if (unlikely(mask & POLLFREE)) + return io_epoll_pollfree_wake(req); + + return io_epoll_execute(req); +} + +int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + + if (sqe->off || sqe->rw_flags || sqe->buf_index || sqe->splice_fd_in) + return -EINVAL; + + iew->maxevents = READ_ONCE(sqe->len); + iew->events = u64_to_user_ptr(READ_ONCE(sqe->addr)); + + iew->wait.flags = 0; + iew->wait.private = req; + iew->wait.func = io_epoll_wait_fn; + INIT_LIST_HEAD(&iew->wait.entry); + atomic_set(&req->poll_refs, 0); + return 0; +} + +int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags) +{ + struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); + struct io_ring_ctx *ctx = req->ctx; + int ret; + + io_ring_submit_lock(ctx, issue_flags); + hlist_add_head(&req->hash_node, &ctx->epoll_list); + io_ring_submit_unlock(ctx, issue_flags); + + /* + * Timeout is fake here, it doesn't indicate any kind of sleep time. + * It's just set to something that is non-zero, so that wait queue + * wakeup is armed if no events are available. + */ + ret = epoll_wait(req->file, iew->events, iew->maxevents, NULL, &iew->wait); + if (ret == -EIOCBQUEUED) + return IOU_ISSUE_SKIP_COMPLETE; + else if (ret < 0) + req_set_fail(req); + io_ring_submit_lock(ctx, issue_flags); + hlist_del_init(&req->hash_node); + io_ring_submit_unlock(ctx, issue_flags); + io_req_set_res(req, ret, 0); + return IOU_OK; +} diff --git a/io_uring/epoll.h b/io_uring/epoll.h index 870cce11ba98..296940d89063 100644 --- a/io_uring/epoll.h +++ b/io_uring/epoll.h @@ -1,6 +1,28 @@ // SPDX-License-Identifier: GPL-2.0 +#include "cancel.h" + #if defined(CONFIG_EPOLL) +int io_epoll_wait_cancel(struct io_ring_ctx *ctx, struct io_cancel_data *cd, + unsigned int issue_flags); +bool io_epoll_wait_remove_all(struct io_ring_ctx *ctx, struct io_uring_task *tctx, + bool cancel_all); + int io_epoll_ctl_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); int io_epoll_ctl(struct io_kiocb *req, unsigned int issue_flags); +int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe); +int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags); +#else +static inline bool io_epoll_wait_remove_all(struct io_ring_ctx *ctx, + struct io_uring_task *tctx, + bool cancel_all) +{ + return false; +} +static inline int io_epoll_wait_cancel(struct io_ring_ctx *ctx, + struct io_cancel_data *cd, + unsigned int issue_flags) +{ + return 0; +} #endif diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index 2f311aeb536f..a17abdbae7ee 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -93,6 +93,7 @@ #include "notif.h" #include "waitid.h" #include "futex.h" +#include "epoll.h" #include "napi.h" #include "uring_cmd.h" #include "msg_ring.h" @@ -358,6 +359,9 @@ static __cold struct io_ring_ctx *io_ring_ctx_alloc(struct io_uring_params *p) INIT_HLIST_HEAD(&ctx->waitid_list); #ifdef CONFIG_FUTEX INIT_HLIST_HEAD(&ctx->futex_list); +#endif +#ifdef CONFIG_EPOLL + INIT_HLIST_HEAD(&ctx->epoll_list); #endif INIT_DELAYED_WORK(&ctx->fallback_work, io_fallback_req_func); INIT_WQ_LIST(&ctx->submit_state.compl_reqs); @@ -3095,6 +3099,7 @@ static __cold bool io_uring_try_cancel_requests(struct io_ring_ctx *ctx, ret |= io_poll_remove_all(ctx, tctx, cancel_all); ret |= io_waitid_remove_all(ctx, tctx, cancel_all); ret |= io_futex_remove_all(ctx, tctx, cancel_all); + ret |= io_epoll_wait_remove_all(ctx, tctx, cancel_all); ret |= io_uring_try_cancel_uring_cmd(ctx, tctx, cancel_all); mutex_unlock(&ctx->uring_lock); ret |= io_kill_timeouts(ctx, tctx, cancel_all); diff --git a/io_uring/opdef.c b/io_uring/opdef.c index e8baef4e5146..44553a657476 100644 --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -514,6 +514,17 @@ const struct io_issue_def io_issue_defs[] = { .async_size = sizeof(struct io_async_msghdr), #else .prep = io_eopnotsupp_prep, +#endif + }, + [IORING_OP_EPOLL_WAIT] = { + .needs_file = 1, + .unbound_nonreg_file = 1, + .audit_skip = 1, +#if defined(CONFIG_EPOLL) + .prep = io_epoll_wait_prep, + .issue = io_epoll_wait, +#else + .prep = io_eopnotsupp_prep, #endif }, }; @@ -745,6 +756,9 @@ const struct io_cold_def io_cold_defs[] = { [IORING_OP_LISTEN] = { .name = "LISTEN", }, + [IORING_OP_EPOLL_WAIT] = { + .name = "EPOLL_WAIT", + }, }; const char *io_uring_get_opcode(u8 opcode) From patchwork Mon Feb 3 16:23:47 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jens Axboe X-Patchwork-Id: 13957808 Received: from mail-io1-f50.google.com (mail-io1-f50.google.com [209.85.166.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A26B20B1F2 for ; Mon, 3 Feb 2025 16:31:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600297; cv=none; b=Yv+qqaiR3lnqrcTOlu0gd0QLBphggbzLiqiq14OVJ0azyZrO8gYBq5kGo5o5jXOduM6yifgfeJe6YDTQ4hCCS9/K87mP9e9myP/+E5lfu2JU14MIynfDhACbdUu/019MvdBCulb4CKkgCQd+iSMOp/E2pX5FeeMo64gqRArMw1g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738600297; c=relaxed/simple; bh=SiJFHOGW3I3zFqN3kp56SHJQkVmhHehxUSx0Yzr7T3E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SIVf5OHQ6qOA5cgKsZ3fjggEKtCQuyiu1vpJmDedhz1o06dWXahICAMwDcwk7a4jQ3BDvXufa5RjSOtqTFlYfeW+GcQ6eh1uUoDKT9/lYJ4/LOKwQ9JnHzN+ckrfTzLUJw+DROYYs5nEGNxK47S4fRk2c4ObeD16sD2iAvmeyNU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk; spf=pass smtp.mailfrom=kernel.dk; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b=eDun3atx; arc=none smtp.client-ip=209.85.166.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.dk Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=kernel.dk Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel-dk.20230601.gappssmtp.com header.i=@kernel-dk.20230601.gappssmtp.com header.b="eDun3atx" Received: by mail-io1-f50.google.com with SMTP id ca18e2360f4ac-844ce213af6so125468739f.1 for ; Mon, 03 Feb 2025 08:31:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1738600294; x=1739205094; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0/fjiuRElsEUMm3xq1l9P9axtS1BBAUWMYirAwWU00g=; b=eDun3atxS7vxUznNqvzlHscmKmp+culWjhLBvNJamQbJNI3zSSllnwE6H20E5RhVlC GdeexBEgDdaMRNCM0Rp/xsAiKCBkwWytUDtmeOf2qjZjDClIS86rNr7jCmP9m3ThkGAD adgOQ+QR+e+dKkHlVgbT2H67SmahIgrsqwb9k7hFeB/HZSJJkEYywJAC9x+DKMbNhw95 8wkvfAy10YfwYLhmM+FvoT1up2XAi/FoX6I93aWMYrr/8NyxiYRy6iPpCSfrO5dS6Dru ylXEMNStmLLB4J1BgnhhXJy7PoJayyYxNNKHoX0VSpb0KzFwEgDZTYvF78m3dt+sg9UK bq+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738600294; x=1739205094; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0/fjiuRElsEUMm3xq1l9P9axtS1BBAUWMYirAwWU00g=; b=YfQziGeH+eM1kvyLRvMYrHEFMaz17LW1T8sYwnrpSldouebxuQKun+9CqLffUCz1DH 46qcBCxiMOzeqRZmjIMhYEqpMJA/L3iTH2/ODg2gzbpEZDgxInArAio+H+4je5Gh+DFT SJwZpihJ8w7RT1EKAVZoc97vljQ0afLyjhBDrVUgaJkZMTZABY3w96y+EOciwlwl9N+I dl1Oa7HwITj/vwzRCOdkV/LCWQy+kdyMhLps3P/7iLRtrWmw8WKSBncXI5pWdukgcrq0 cbqmbzXDOf+l2ewrg0Svn54ebiIXf7XZ97mJJ5U5C+9MX77u3F5Z+FGZjqRfEx7rHJs5 9tSA== X-Gm-Message-State: AOJu0Yz5sylBAoTFd6iYTjnREL+ID1/1dJ6LXPUi0qDsQiRMhw0W7xYU i+ZBvbK7lVdfDSULkAeTZngNbJcH/e9IA2GZkM4nsZrALGe/ejxQ6Hf+TtLjlLAPwhA1gqGUZjU ZT+E= X-Gm-Gg: ASbGnctM96J4l6IcXDwO/l/w5dvjd9zdDrp1YFlDJD9e3IhsX38uYV3aCNbDvR57rAe W34eqytVILmLTsW4U7Me/YC+s0/xHKKwTn/axaYWTFPsa5FhPxvktKlu+jYTAM/HaIi8VA12pDG WtzyFp9jkeLXb9E/FjgaOaNcU+7ifAgFB2ip9XkN2X2R+Hi7gU8yXc2mhFci8dQLS02DUGDQ0H3 GTVGKgBlS+nZ1qJKyPa8nFrbr9/pbybj2fw3ze/P+JoGF6mtwkDQ8hmVse/VD+y/eKQ2ar1tuDt ToSbiwrWlF0O4h2zw4k= X-Google-Smtp-Source: AGHT+IH2wV2mJH512YPftlISVyZWUOTcz9QVi4y2nD/702D7EXmTUdDrOmfIKi7tAhOGaOpiBEGhqg== X-Received: by 2002:a05:6602:3990:b0:844:debf:24dc with SMTP id ca18e2360f4ac-85411111991mr2269268639f.5.1738600292695; Mon, 03 Feb 2025 08:31:32 -0800 (PST) Received: from localhost.localdomain ([96.43.243.2]) by smtp.gmail.com with ESMTPSA id ca18e2360f4ac-854a16123c6sm243748139f.24.2025.02.03.08.31.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Feb 2025 08:31:31 -0800 (PST) From: Jens Axboe To: io-uring@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, brauner@kernel.org, Jens Axboe Subject: [PATCH 9/9] io_uring/epoll: add multishot support for IORING_OP_EPOLL_WAIT Date: Mon, 3 Feb 2025 09:23:47 -0700 Message-ID: <20250203163114.124077-10-axboe@kernel.dk> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250203163114.124077-1-axboe@kernel.dk> References: <20250203163114.124077-1-axboe@kernel.dk> Precedence: bulk X-Mailing-List: io-uring@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 As with other multishot requests, submitting a multishot epoll wait request will keep it re-armed post the initial trigger. This allows multiple epoll wait completions per request submitted, every time events are available. If more completions are expected for this epoll wait request, then IORING_CQE_F_MORE will be set in the posted cqe->flags. For multishot, the request remains on the epoll callback waitqueue head. This means that epoll doesn't need to juggle the ep->lock writelock (and disable/enable IRQs) for each invocation of the reaping loop. That should translate into nice efficiency gains. Use by setting IORING_EPOLL_WAIT_MULTISHOT in the sqe->epoll_flags member. Signed-off-by: Jens Axboe --- include/uapi/linux/io_uring.h | 6 ++++++ io_uring/epoll.c | 40 ++++++++++++++++++++++++++--------- 2 files changed, 36 insertions(+), 10 deletions(-) diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h index a559e1e1544a..93f504b6d4ec 100644 --- a/include/uapi/linux/io_uring.h +++ b/include/uapi/linux/io_uring.h @@ -73,6 +73,7 @@ struct io_uring_sqe { __u32 futex_flags; __u32 install_fd_flags; __u32 nop_flags; + __u32 epoll_flags; }; __u64 user_data; /* data to be passed back at completion time */ /* pack this to avoid bogus arm OABI complaints */ @@ -405,6 +406,11 @@ enum io_uring_op { #define IORING_ACCEPT_DONTWAIT (1U << 1) #define IORING_ACCEPT_POLL_FIRST (1U << 2) +/* + * epoll_wait flags, stored in sqe->epoll_flags + */ +#define IORING_EPOLL_WAIT_MULTISHOT (1U << 0) + /* * IORING_OP_MSG_RING command types, stored in sqe->addr */ diff --git a/io_uring/epoll.c b/io_uring/epoll.c index 2a9c679516c8..730f4b729f5b 100644 --- a/io_uring/epoll.c +++ b/io_uring/epoll.c @@ -24,6 +24,7 @@ struct io_epoll { struct io_epoll_wait { struct file *file; int maxevents; + int flags; struct epoll_event __user *events; struct wait_queue_entry wait; }; @@ -145,12 +146,15 @@ static void io_epoll_retry(struct io_kiocb *req, struct io_tw_state *ts) io_req_task_submit(req, ts); } -static int io_epoll_execute(struct io_kiocb *req) +static int io_epoll_execute(struct io_kiocb *req, __poll_t mask) { struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); if (io_poll_get_ownership(req)) { - list_del_init_careful(&iew->wait.entry); + if (mask & EPOLL_URING_WAKE) + req->flags &= ~REQ_F_APOLL_MULTISHOT; + if (!(req->flags & REQ_F_APOLL_MULTISHOT)) + list_del_init_careful(&iew->wait.entry); req->io_task_work.func = io_epoll_retry; io_req_task_work_add(req); return 1; @@ -159,13 +163,13 @@ static int io_epoll_execute(struct io_kiocb *req) return 0; } -static __cold int io_epoll_pollfree_wake(struct io_kiocb *req) +static __cold int io_epoll_pollfree_wake(struct io_kiocb *req, __poll_t mask) { struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); io_poll_mark_cancelled(req); list_del_init_careful(&iew->wait.entry); - io_epoll_execute(req); + io_epoll_execute(req, mask); return 1; } @@ -176,18 +180,23 @@ static int io_epoll_wait_fn(struct wait_queue_entry *wait, unsigned mode, __poll_t mask = key_to_poll(key); if (unlikely(mask & POLLFREE)) - return io_epoll_pollfree_wake(req); + return io_epoll_pollfree_wake(req, mask); - return io_epoll_execute(req); + return io_epoll_execute(req, mask); } int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { struct io_epoll_wait *iew = io_kiocb_to_cmd(req, struct io_epoll_wait); - if (sqe->off || sqe->rw_flags || sqe->buf_index || sqe->splice_fd_in) + if (sqe->off || sqe->buf_index || sqe->splice_fd_in) return -EINVAL; + iew->flags = READ_ONCE(sqe->epoll_flags); + if (iew->flags & ~IORING_EPOLL_WAIT_MULTISHOT) + return -EINVAL; + else if (iew->flags & IORING_EPOLL_WAIT_MULTISHOT) + req->flags |= REQ_F_APOLL_MULTISHOT; iew->maxevents = READ_ONCE(sqe->len); iew->events = u64_to_user_ptr(READ_ONCE(sqe->addr)); @@ -195,6 +204,7 @@ int io_epoll_wait_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) iew->wait.private = req; iew->wait.func = io_epoll_wait_fn; INIT_LIST_HEAD(&iew->wait.entry); + INIT_HLIST_NODE(&req->hash_node); atomic_set(&req->poll_refs, 0); return 0; } @@ -205,9 +215,11 @@ int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags) struct io_ring_ctx *ctx = req->ctx; int ret; - io_ring_submit_lock(ctx, issue_flags); - hlist_add_head(&req->hash_node, &ctx->epoll_list); - io_ring_submit_unlock(ctx, issue_flags); + if (hlist_unhashed(&req->hash_node)) { + io_ring_submit_lock(ctx, issue_flags); + hlist_add_head(&req->hash_node, &ctx->epoll_list); + io_ring_submit_unlock(ctx, issue_flags); + } /* * Timeout is fake here, it doesn't indicate any kind of sleep time. @@ -219,9 +231,17 @@ int io_epoll_wait(struct io_kiocb *req, unsigned int issue_flags) return IOU_ISSUE_SKIP_COMPLETE; else if (ret < 0) req_set_fail(req); + + if (ret >= 0 && req->flags & REQ_F_APOLL_MULTISHOT && + io_req_post_cqe(req, ret, IORING_CQE_F_MORE)) + return IOU_ISSUE_SKIP_COMPLETE; + io_ring_submit_lock(ctx, issue_flags); hlist_del_init(&req->hash_node); io_ring_submit_unlock(ctx, issue_flags); io_req_set_res(req, ret, 0); + + if (issue_flags & IO_URING_F_MULTISHOT) + return IOU_STOP_MULTISHOT; return IOU_OK; }