eventfd: avoid unnecessary wakeups in eventfd_write()

Message ID	tencent_DC522F05F54C72A6EF3193F9313CD756350A@qq.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@vger.kernel.org> Message-ID: <tencent_DC522F05F54C72A6EF3193F9313CD756350A@qq.com> From: wenyang.linux@foxmail.com To: Alexander Viro <viro@zeniv.linux.org.uk>, Jens Axboe <axboe@kernel.dk>, Christian Brauner <brauner@kernel.org> Cc: Wen Yang <wenyang.linux@foxmail.com>, Christoph Hellwig <hch@lst.de>, Dylan Yudaken <dylany@fb.com>, David Woodhouse <dwmw@amazon.co.uk>, Matthew Wilcox <willy@infradead.org>, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] eventfd: avoid unnecessary wakeups in eventfd_write() Date: Thu, 13 Jul 2023 00:42:32 +0800 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	eventfd: avoid unnecessary wakeups in eventfd_write() \| expand eventfd: avoid unnecessary wakeups in eventfd_write()

Message ID

tencent_DC522F05F54C72A6EF3193F9313CD756350A@qq.com (mailing list archive)

State

New, archived

Headers

Message-ID: <tencent_DC522F05F54C72A6EF3193F9313CD756350A@qq.com>
From: wenyang.linux@foxmail.com
To: Alexander Viro <viro@zeniv.linux.org.uk>,
        Jens Axboe <axboe@kernel.dk>,
        Christian Brauner <brauner@kernel.org>
Cc: Wen Yang <wenyang.linux@foxmail.com>,
        Christoph Hellwig <hch@lst.de>, Dylan Yudaken <dylany@fb.com>,
        David Woodhouse <dwmw@amazon.co.uk>,
        Matthew Wilcox <willy@infradead.org>,
        linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH] eventfd: avoid unnecessary wakeups in eventfd_write()
Date: Thu, 13 Jul 2023 00:42:32 +0800
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

eventfd: avoid unnecessary wakeups in eventfd_write() | expand

Commit Message

Wen Yang July 12, 2023, 4:42 p.m. UTC

From: Wen Yang <wenyang.linux@foxmail.com>

In eventfd_write(), when ucnt is 0 and ctx->count is also 0,
current->in_eventfd will be set to 1, which may affect eventfd_signal(),
and unnecessary wakeups will also be performed.

Fix this issue by ensuring that ctx->count is not zero.

Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dylan Yudaken <dylany@fb.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 fs/eventfd.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Christian Brauner July 13, 2023, 8:56 a.m. UTC | #1

On Thu, Jul 13, 2023 at 12:42:32AM +0800, wenyang.linux@foxmail.com wrote:
> From: Wen Yang <wenyang.linux@foxmail.com>
> 
> In eventfd_write(), when ucnt is 0 and ctx->count is also 0,
> current->in_eventfd will be set to 1, which may affect eventfd_signal(),
> and unnecessary wakeups will also be performed.
> 
> Fix this issue by ensuring that ctx->count is not zero.
> 
> Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Dylan Yudaken <dylany@fb.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  fs/eventfd.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/eventfd.c b/fs/eventfd.c
> index 33a918f9566c..254b18ff0e00 100644
> --- a/fs/eventfd.c
> +++ b/fs/eventfd.c
> @@ -281,10 +281,12 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
>  	}
>  	if (likely(res > 0)) {
>  		ctx->count += ucnt;
> -		current->in_eventfd = 1;
> -		if (waitqueue_active(&ctx->wqh))
> -			wake_up_locked_poll(&ctx->wqh, EPOLLIN);
> -		current->in_eventfd = 0;
> +		if (ctx->count) {
> +			current->in_eventfd = 1;
> +			if (waitqueue_active(&ctx->wqh))
> +				wake_up_locked_poll(&ctx->wqh, EPOLLIN);
> +			current->in_eventfd = 0;
> +		}
>  	}
>  	spin_unlock_irq(&ctx->wqh.lock);

I don't think we can do this. Consider the following:

        struct pollfd pfd = {
                .events = POLLIN | POLLOUT,
        };

        int fd = eventfd(0, 0);
        if (fd < 0)
                return -1;

        write(fd, &w, sizeof(__u64));

        poll(&pfd, 1, -1);

        printf("%d\n", pfd.revents & POLLOUT);

Currently, the eventfd_poll() will do:

        ULLONG_MAX - 1 > ctx->count

informing pollers with POLLOUT that the eventfd is writable, iow, that
the count has overflowed.

After your change such POLLOUT waiters will hang forever even though the
eventfd is writable.

So currently, a zero write on an eventfd can be used to inform another
process that they can write. This breaks this completely.

Caller's that don't want to be woken up on zero writes should just not
set POLLOUT:

        struct pollfd pfd = {
                .events = POLLIN,
        };

        int fd = eventfd(0, 0);
        if (fd < 0)
                return -1;

        write(fd, &w, sizeof(__u64));

        poll(&pfd, 1, -1);

This will wait until someone actually writes something.

diff --git a/fs/eventfd.c b/fs/eventfd.c
index 33a918f9566c..254b18ff0e00 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -281,10 +281,12 @@  static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
 	}
 	if (likely(res > 0)) {
 		ctx->count += ucnt;
-		current->in_eventfd = 1;
-		if (waitqueue_active(&ctx->wqh))
-			wake_up_locked_poll(&ctx->wqh, EPOLLIN);
-		current->in_eventfd = 0;
+		if (ctx->count) {
+			current->in_eventfd = 1;
+			if (waitqueue_active(&ctx->wqh))
+				wake_up_locked_poll(&ctx->wqh, EPOLLIN);
+			current->in_eventfd = 0;
+		}
 	}
 	spin_unlock_irq(&ctx->wqh.lock);

eventfd: avoid unnecessary wakeups in eventfd_write()

Commit Message

Comments

Patch