Message ID | ed4484a3dc8297296bfcd16810f7dc1976d6f7d0.1605808477.git.mkubecek@suse.cz (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] eventfd: convert to ->write_iter() | expand |
On Thu, Nov 19, 2020 at 07:00:19PM +0100, Michal Kubecek wrote: > While eventfd ->read() callback was replaced by ->read_iter() recently by > commit 12aceb89b0bc ("eventfd: convert to f_op->read_iter()"), ->write() > was not replaced. > > Convert also ->write() to ->write_iter() to make the interface more > consistent and allow non-blocking writes from e.g. io_uring. Also > reorganize the code and return value handling in a similar way as it was > done in eventfd_read(). But this patch does not allow non-blocking writes. I'm really suspicious as you're obviously trying to hide something from us.
On Thu, Nov 19, 2020 at 06:03:15PM +0000, Christoph Hellwig wrote: > On Thu, Nov 19, 2020 at 07:00:19PM +0100, Michal Kubecek wrote: > > While eventfd ->read() callback was replaced by ->read_iter() recently by > > commit 12aceb89b0bc ("eventfd: convert to f_op->read_iter()"), ->write() > > was not replaced. > > > > Convert also ->write() to ->write_iter() to make the interface more > > consistent and allow non-blocking writes from e.g. io_uring. Also > > reorganize the code and return value handling in a similar way as it was > > done in eventfd_read(). > > But this patch does not allow non-blocking writes. I'm really > suspicious as you're obviously trying to hide something from us. I already explained what my original motivation was and explained that it's no longer the case as the third party module that inspired me to take a look at this can be easily patched not to need kernel_write() to eventfd - and that it almost certainly will have to be patched that way anyway. BtW, the reason I did not mention out of tree modules in the commit message was exactly this: I suspected that any mention of them could be a red flag for some people. I believed - and I still believe - that this patch is useful for other reasons and Jens added another. Therefore I resubmitted with commit message rewritten as requested, even if I don't need it personally. I'm not hiding anything and I don't have time for playing your political games and suffer your attacks. If they are more important than improving kernel code, so be it. I'm annoyed enough and I don't care any more. Michal Kubecek
On 11/19/20 11:03 AM, Christoph Hellwig wrote: > On Thu, Nov 19, 2020 at 07:00:19PM +0100, Michal Kubecek wrote: >> While eventfd ->read() callback was replaced by ->read_iter() recently by >> commit 12aceb89b0bc ("eventfd: convert to f_op->read_iter()"), ->write() >> was not replaced. >> >> Convert also ->write() to ->write_iter() to make the interface more >> consistent and allow non-blocking writes from e.g. io_uring. Also >> reorganize the code and return value handling in a similar way as it was >> done in eventfd_read(). > > But this patch does not allow non-blocking writes. What am I missing here? He checks the file and IOCB non-block flags, and returns -EAGAIN if there's no space. If not, it waits and schedules.
On Thu, Nov 19, 2020 at 07:46:10PM +0100, Michal Kubecek wrote: > On Thu, Nov 19, 2020 at 06:03:15PM +0000, Christoph Hellwig wrote: > > On Thu, Nov 19, 2020 at 07:00:19PM +0100, Michal Kubecek wrote: > > > While eventfd ->read() callback was replaced by ->read_iter() recently by > > > commit 12aceb89b0bc ("eventfd: convert to f_op->read_iter()"), ->write() > > > was not replaced. > > > > > > Convert also ->write() to ->write_iter() to make the interface more > > > consistent and allow non-blocking writes from e.g. io_uring. Also > > > reorganize the code and return value handling in a similar way as it was > > > done in eventfd_read(). > > > > But this patch does not allow non-blocking writes. I'm really > > suspicious as you're obviously trying to hide something from us. > > I already explained what my original motivation was and explained that > it's no longer the case as the third party module that inspired me to > take a look at this can be easily patched not to need kernel_write() to > eventfd - and that it almost certainly will have to be patched that way > anyway. BtW, the reason I did not mention out of tree modules in the > commit message was exactly this: I suspected that any mention of them > could be a red flag for some people. > > I believed - and I still believe - that this patch is useful for other > reasons and Jens added another. Therefore I resubmitted with commit > message rewritten as requested, even if I don't need it personally. I'm > not hiding anything and I don't have time for playing your political > games and suffer your attacks. If they are more important than improving > kernel code, so be it. I'm annoyed enough and I don't care any more. Just few hours later, a new version of the product was released where the module still calls file->f_op->write() directly as it did before but they use a dedicated userspace buffer for this kernel write. Whatever I think about their solution, the result is that right now their module works with current mainline but it would break with this patch. So much for hidden agenda... For the record, I still believe this patch is the right thing to do. Michal Kubecek
diff --git a/fs/eventfd.c b/fs/eventfd.c index df466ef81ddd..35973d216847 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -261,35 +261,36 @@ static ssize_t eventfd_read(struct kiocb *iocb, struct iov_iter *to) return sizeof(ucnt); } -static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t count, - loff_t *ppos) +static ssize_t eventfd_write(struct kiocb *iocb, struct iov_iter *from) { + struct file *file = iocb->ki_filp; struct eventfd_ctx *ctx = file->private_data; - ssize_t res; __u64 ucnt; DECLARE_WAITQUEUE(wait, current); - if (count < sizeof(ucnt)) + if (iov_iter_count(from) < sizeof(ucnt)) return -EINVAL; - if (copy_from_user(&ucnt, buf, sizeof(ucnt))) + if (unlikely(!copy_from_iter_full(&ucnt, sizeof(ucnt), from))) return -EFAULT; if (ucnt == ULLONG_MAX) return -EINVAL; spin_lock_irq(&ctx->wqh.lock); - res = -EAGAIN; - if (ULLONG_MAX - ctx->count > ucnt) - res = sizeof(ucnt); - else if (!(file->f_flags & O_NONBLOCK)) { + if (ULLONG_MAX - ctx->count <= ucnt) { + if ((file->f_flags & O_NONBLOCK) || + (iocb->ki_flags & IOCB_NOWAIT)) { + spin_unlock_irq(&ctx->wqh.lock); + return -EAGAIN; + } __add_wait_queue(&ctx->wqh, &wait); - for (res = 0;;) { + for (;;) { set_current_state(TASK_INTERRUPTIBLE); - if (ULLONG_MAX - ctx->count > ucnt) { - res = sizeof(ucnt); + if (ULLONG_MAX - ctx->count > ucnt) break; - } if (signal_pending(current)) { - res = -ERESTARTSYS; - break; + __remove_wait_queue(&ctx->wqh, &wait); + __set_current_state(TASK_RUNNING); + spin_unlock_irq(&ctx->wqh.lock); + return -ERESTARTSYS; } spin_unlock_irq(&ctx->wqh.lock); schedule(); @@ -298,14 +299,12 @@ static ssize_t eventfd_write(struct file *file, const char __user *buf, size_t c __remove_wait_queue(&ctx->wqh, &wait); __set_current_state(TASK_RUNNING); } - if (likely(res > 0)) { - ctx->count += ucnt; - if (waitqueue_active(&ctx->wqh)) - wake_up_locked_poll(&ctx->wqh, EPOLLIN); - } + ctx->count += ucnt; + if (waitqueue_active(&ctx->wqh)) + wake_up_locked_poll(&ctx->wqh, EPOLLIN); spin_unlock_irq(&ctx->wqh.lock); - return res; + return sizeof(ucnt); } #ifdef CONFIG_PROC_FS @@ -328,7 +327,7 @@ static const struct file_operations eventfd_fops = { .release = eventfd_release, .poll = eventfd_poll, .read_iter = eventfd_read, - .write = eventfd_write, + .write_iter = eventfd_write, .llseek = noop_llseek, };
While eventfd ->read() callback was replaced by ->read_iter() recently by commit 12aceb89b0bc ("eventfd: convert to f_op->read_iter()"), ->write() was not replaced. Convert also ->write() to ->write_iter() to make the interface more consistent and allow non-blocking writes from e.g. io_uring. Also reorganize the code and return value handling in a similar way as it was done in eventfd_read(). v2: different reasoning in commit message (no code change) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> --- fs/eventfd.c | 43 +++++++++++++++++++++---------------------- 1 file changed, 21 insertions(+), 22 deletions(-)