diff mbox series

[v2] eventfd: convert to ->write_iter()

Message ID ed4484a3dc8297296bfcd16810f7dc1976d6f7d0.1605808477.git.mkubecek@suse.cz (mailing list archive)
State New, archived
Headers show
Series [v2] eventfd: convert to ->write_iter() | expand

Commit Message

Michal Kubecek Nov. 19, 2020, 6 p.m. UTC
While eventfd ->read() callback was replaced by ->read_iter() recently by
commit 12aceb89b0bc ("eventfd: convert to f_op->read_iter()"), ->write()
was not replaced.

Convert also ->write() to ->write_iter() to make the interface more
consistent and allow non-blocking writes from e.g. io_uring. Also
reorganize the code and return value handling in a similar way as it was
done in eventfd_read().

v2: different reasoning in commit message (no code change)

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
---
 fs/eventfd.c | 43 +++++++++++++++++++++----------------------
 1 file changed, 21 insertions(+), 22 deletions(-)

Comments

Christoph Hellwig Nov. 19, 2020, 6:03 p.m. UTC | #1
On Thu, Nov 19, 2020 at 07:00:19PM +0100, Michal Kubecek wrote:
> While eventfd ->read() callback was replaced by ->read_iter() recently by
> commit 12aceb89b0bc ("eventfd: convert to f_op->read_iter()"), ->write()
> was not replaced.
> 
> Convert also ->write() to ->write_iter() to make the interface more
> consistent and allow non-blocking writes from e.g. io_uring. Also
> reorganize the code and return value handling in a similar way as it was
> done in eventfd_read().

But this patch does not allow non-blocking writes.  I'm really
suspicious as you're obviously trying to hide something from us.
Michal Kubecek Nov. 19, 2020, 6:46 p.m. UTC | #2
On Thu, Nov 19, 2020 at 06:03:15PM +0000, Christoph Hellwig wrote:
> On Thu, Nov 19, 2020 at 07:00:19PM +0100, Michal Kubecek wrote:
> > While eventfd ->read() callback was replaced by ->read_iter() recently by
> > commit 12aceb89b0bc ("eventfd: convert to f_op->read_iter()"), ->write()
> > was not replaced.
> > 
> > Convert also ->write() to ->write_iter() to make the interface more
> > consistent and allow non-blocking writes from e.g. io_uring. Also
> > reorganize the code and return value handling in a similar way as it was
> > done in eventfd_read().
> 
> But this patch does not allow non-blocking writes.  I'm really
> suspicious as you're obviously trying to hide something from us.

I already explained what my original motivation was and explained that
it's no longer the case as the third party module that inspired me to
take a look at this can be easily patched not to need kernel_write() to
eventfd - and that it almost certainly will have to be patched that way
anyway. BtW, the reason I did not mention out of tree modules in the
commit message was exactly this: I suspected that any mention of them
could be a red flag for some people.

I believed - and I still believe - that this patch is useful for other
reasons and Jens added another. Therefore I resubmitted with commit
message rewritten as requested, even if I don't need it personally. I'm
not hiding anything and I don't have time for playing your political
games and suffer your attacks. If they are more important than improving
kernel code, so be it. I'm annoyed enough and I don't care any more.

Michal Kubecek
Jens Axboe Nov. 19, 2020, 6:48 p.m. UTC | #3
On 11/19/20 11:03 AM, Christoph Hellwig wrote:
> On Thu, Nov 19, 2020 at 07:00:19PM +0100, Michal Kubecek wrote:
>> While eventfd ->read() callback was replaced by ->read_iter() recently by
>> commit 12aceb89b0bc ("eventfd: convert to f_op->read_iter()"), ->write()
>> was not replaced.
>>
>> Convert also ->write() to ->write_iter() to make the interface more
>> consistent and allow non-blocking writes from e.g. io_uring. Also
>> reorganize the code and return value handling in a similar way as it was
>> done in eventfd_read().
> 
> But this patch does not allow non-blocking writes.

What am I missing here? He checks the file and IOCB non-block flags,
and returns -EAGAIN if there's no space. If not, it waits and schedules.
Michal Kubecek Nov. 21, 2020, 5:07 p.m. UTC | #4
On Thu, Nov 19, 2020 at 07:46:10PM +0100, Michal Kubecek wrote:
> On Thu, Nov 19, 2020 at 06:03:15PM +0000, Christoph Hellwig wrote:
> > On Thu, Nov 19, 2020 at 07:00:19PM +0100, Michal Kubecek wrote:
> > > While eventfd ->read() callback was replaced by ->read_iter() recently by
> > > commit 12aceb89b0bc ("eventfd: convert to f_op->read_iter()"), ->write()
> > > was not replaced.
> > > 
> > > Convert also ->write() to ->write_iter() to make the interface more
> > > consistent and allow non-blocking writes from e.g. io_uring. Also
> > > reorganize the code and return value handling in a similar way as it was
> > > done in eventfd_read().
> > 
> > But this patch does not allow non-blocking writes.  I'm really
> > suspicious as you're obviously trying to hide something from us.
> 
> I already explained what my original motivation was and explained that
> it's no longer the case as the third party module that inspired me to
> take a look at this can be easily patched not to need kernel_write() to
> eventfd - and that it almost certainly will have to be patched that way
> anyway. BtW, the reason I did not mention out of tree modules in the
> commit message was exactly this: I suspected that any mention of them
> could be a red flag for some people.
> 
> I believed - and I still believe - that this patch is useful for other
> reasons and Jens added another. Therefore I resubmitted with commit
> message rewritten as requested, even if I don't need it personally. I'm
> not hiding anything and I don't have time for playing your political
> games and suffer your attacks. If they are more important than improving
> kernel code, so be it. I'm annoyed enough and I don't care any more.

Just few hours later, a new version of the product was released where
the module still calls file->f_op->write() directly as it did before but
they use a dedicated userspace buffer for this kernel write. Whatever
I think about their solution, the result is that right now their module
works with current mainline but it would break with this patch. So much
for hidden agenda...

For the record, I still believe this patch is the right thing to do.

Michal Kubecek
diff mbox series

Patch

diff --git a/fs/eventfd.c b/fs/eventfd.c
index df466ef81ddd..35973d216847 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -261,35 +261,36 @@  static ssize_t eventfd_read(struct kiocb *iocb, struct iov_iter *to)
 	return sizeof(ucnt);
 }
 
-static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count,
-			     loff_t *ppos)
+static ssize_t eventfd_write(struct kiocb *iocb, struct iov_iter *from)
 {
+	struct file *file = iocb->ki_filp;
 	struct eventfd_ctx *ctx = file->private_data;
-	ssize_t res;
 	__u64 ucnt;
 	DECLARE_WAITQUEUE(wait, current);
 
-	if (count < sizeof(ucnt))
+	if (iov_iter_count(from) < sizeof(ucnt))
 		return -EINVAL;
-	if (copy_from_user(&ucnt, buf, sizeof(ucnt)))
+	if (unlikely(!copy_from_iter_full(&ucnt, sizeof(ucnt), from)))
 		return -EFAULT;
 	if (ucnt == ULLONG_MAX)
 		return -EINVAL;
 	spin_lock_irq(&ctx->wqh.lock);
-	res = -EAGAIN;
-	if (ULLONG_MAX - ctx->count > ucnt)
-		res = sizeof(ucnt);
-	else if (!(file->f_flags & O_NONBLOCK)) {
+	if (ULLONG_MAX - ctx->count <= ucnt) {
+		if ((file->f_flags & O_NONBLOCK) ||
+		    (iocb->ki_flags & IOCB_NOWAIT)) {
+			spin_unlock_irq(&ctx->wqh.lock);
+			return -EAGAIN;
+		}
 		__add_wait_queue(&ctx->wqh, &wait);
-		for (res = 0;;) {
+		for (;;) {
 			set_current_state(TASK_INTERRUPTIBLE);
-			if (ULLONG_MAX - ctx->count > ucnt) {
-				res = sizeof(ucnt);
+			if (ULLONG_MAX - ctx->count > ucnt)
 				break;
-			}
 			if (signal_pending(current)) {
-				res = -ERESTARTSYS;
-				break;
+				__remove_wait_queue(&ctx->wqh, &wait);
+				__set_current_state(TASK_RUNNING);
+				spin_unlock_irq(&ctx->wqh.lock);
+				return -ERESTARTSYS;
 			}
 			spin_unlock_irq(&ctx->wqh.lock);
 			schedule();
@@ -298,14 +299,12 @@  static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c
 		__remove_wait_queue(&ctx->wqh, &wait);
 		__set_current_state(TASK_RUNNING);
 	}
-	if (likely(res > 0)) {
-		ctx->count += ucnt;
-		if (waitqueue_active(&ctx->wqh))
-			wake_up_locked_poll(&ctx->wqh, EPOLLIN);
-	}
+	ctx->count += ucnt;
+	if (waitqueue_active(&ctx->wqh))
+		wake_up_locked_poll(&ctx->wqh, EPOLLIN);
 	spin_unlock_irq(&ctx->wqh.lock);
 
-	return res;
+	return sizeof(ucnt);
 }
 
 #ifdef CONFIG_PROC_FS
@@ -328,7 +327,7 @@  static const struct file_operations eventfd_fops = {
 	.release	= eventfd_release,
 	.poll		= eventfd_poll,
 	.read_iter	= eventfd_read,
-	.write		= eventfd_write,
+	.write_iter	= eventfd_write,
 	.llseek		= noop_llseek,
 };