From patchwork Tue Aug 22 09:45:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viresh Kumar X-Patchwork-Id: 13360439 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 3E9BAEE4993 for ; Tue, 22 Aug 2023 09:45:41 +0000 (UTC) Received: from list by lists.xenproject.org with outflank-mailman.588335.919832 (Exim 4.92) (envelope-from ) id 1qYNwt-00048N-9E; Tue, 22 Aug 2023 09:45:19 +0000 X-Outflank-Mailman: Message body and most headers restored to incoming version Received: by outflank-mailman (output) from mailman id 588335.919832; Tue, 22 Aug 2023 09:45:19 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1qYNwt-00048G-6E; Tue, 22 Aug 2023 09:45:19 +0000 Received: by outflank-mailman (input) for mailman id 588335; Tue, 22 Aug 2023 09:45:17 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1qYNwr-00048A-OT for xen-devel@lists.xenproject.org; Tue, 22 Aug 2023 09:45:17 +0000 Received: from mail-ot1-x333.google.com (mail-ot1-x333.google.com [2607:f8b0:4864:20::333]) by se1-gles-flk1.inumbo.com (Halon) with ESMTPS id 973d7542-40d0-11ee-9b0c-b553b5be7939; Tue, 22 Aug 2023 11:45:13 +0200 (CEST) Received: by mail-ot1-x333.google.com with SMTP id 46e09a7af769-6bd77864e05so3077494a34.3 for ; Tue, 22 Aug 2023 02:45:13 -0700 (PDT) Received: from localhost ([122.172.87.195]) by smtp.gmail.com with ESMTPSA id x33-20020a17090a6c2400b00267d9f4d340sm9977765pjj.44.2023.08.22.02.45.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Aug 2023 02:45:11 -0700 (PDT) X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 973d7542-40d0-11ee-9b0c-b553b5be7939 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1692697512; x=1693302312; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=9VfBCvp+lAdhb3vS0UcFySyoen6a3yAs2OXAZaKdJ+8=; b=tx52sTenxUlwKNo0bL0jQLNA7dojDYxud0ywtmQOi6Wsa/LpFYFhGzcDQR/eEUiFP1 Kai5Bt5dm9cKZbQyLtvI5Nhgh9ai0vS1wWoDQ9+SmbQbKgp4gbVNLiYM3XYP5lKAQ7Tb yDl/uMs2L4qE4evH0sF2ttPMXVZPohJhaCvNwsGNsEHim73wn87QoXHUsVGmX3h+S7nb 0Kp6vnn04bjxdN3NqB2slka9SQapzC7ic5tstRPXEVFb44VSGWtWj2+WxfH8xOz/RlHJ pldgSZ7yyrE4+ZRUzqBljgCP56J0DCpXFE/WOCf/2m45Mi6ElQpkZu5xLadEvVzQAUU6 E2xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692697512; x=1693302312; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=9VfBCvp+lAdhb3vS0UcFySyoen6a3yAs2OXAZaKdJ+8=; b=AhoGq/ldM2sKjzHURZ24PUthfpOdgp2W95Pb9FYX+EduoCG2lDIilrigaNAlShcA2R KlbA1R4q38JiyV0m8GNtn+uGzYj234BL48Ad8EO2EiOiTaVEyvvwzgWgbRSpj849WY3o Fmkndsil8bVTzgl4wlbeMLHhL3snus8kUt2u61Oy9L7tsHxEw0uFUnhbTZQsFqBSR8Uk PoRNy5VRJWXlRTY8Qx6Uvig3OjLLJ9XaKPKm69/7pSDaUGghWjWWk+C4ywCGd06Hjc8y OBEWQNNg6O/yYS/aNlS4dDfqS3VsReB+eax7uhsNql92e/RDg0Mj6C4uBY/nc2MwIcYN Nwag== X-Gm-Message-State: AOJu0YzMAJ03Y879UpwDRxVxuQU7G00XCy64gvXspbfbSFR5lyaVcs/G RRcA1UzEcRhvswa1+c9fDcAlVQ== X-Google-Smtp-Source: AGHT+IE+iHecosexqebaSGs2cp0IYCYy55vkn9aeuegJ7zhQLKrPZoKbcw/+cfe+SShEk+jeSVs9tw== X-Received: by 2002:a05:6870:a9a8:b0:1c1:2a3e:8e1f with SMTP id ep40-20020a056870a9a800b001c12a3e8e1fmr11554193oab.30.1692697512302; Tue, 22 Aug 2023 02:45:12 -0700 (PDT) From: Viresh Kumar To: Juergen Gross , Stefano Stabellini , Oleksandr Tyshchenko Cc: Viresh Kumar , Vincent Guittot , =?utf-8?q?Alex_Benn=C3=A9e?= , stratos-dev@op-lists.linaro.org, Erik Schilling , Manos Pitsidianakis , Mathieu Poirier , xen-devel@lists.xenproject.org, linux-kernel@vger.kernel.org Subject: [PATCH V5] xen: privcmd: Add support for irqfd Date: Tue, 22 Aug 2023 15:15:07 +0530 Message-Id: <8e724ac1f50c2bc1eb8da9b3ff6166f1372570aa.1692697321.git.viresh.kumar@linaro.org> X-Mailer: git-send-email 2.31.1.272.g89b43f80a514 MIME-Version: 1.0 Xen provides support for injecting interrupts to the guests via the HYPERVISOR_dm_op() hypercall. The same is used by the Virtio based device backend implementations, in an inefficient manner currently. Generally, the Virtio backends are implemented to work with the Eventfd based mechanism. In order to make such backends work with Xen, another software layer needs to poll the Eventfds and raise an interrupt to the guest using the Xen based mechanism. This results in an extra context switch. This is not a new problem in Linux though. It is present with other hypervisors like KVM, etc. as well. The generic solution implemented in the kernel for them is to provide an IOCTL call to pass the interrupt details and eventfd, which lets the kernel take care of polling the eventfd and raising of the interrupt, instead of handling this in user space (which involves an extra context switch). This patch adds support to inject a specific interrupt to guest using the eventfd mechanism, by preventing the extra context switch. Inspired by existing implementations for KVM, etc.. Signed-off-by: Viresh Kumar Reviewed-by: Juergen Gross --- V4->V5 - Drop "default m" and update description in Kconfig. - Drop INIT_LIST_HEAD() for kirqfd->list. - Drop local "eventfd" variable. - Drop unnecessary !! usage. V3->V4 - Drop the imported definitions to hvm/dm_op.h. - Make the caller pass a pointer to pre-filled "struct xen_dm_op" instance and get rid of irq and level fields. - Enable the irqfd feature under a new Kconfig entry. V2->V3 - Select EVENTFD from Kconfig V1->V2: - Improve error handling. - Remove the unnecessary usage of list_for_each_entry_safe(). - Restrict the use of XEN_DMOP_set_irq_level to only ARM64. - Import definitions from Xen to hvm/dm_op.h. drivers/xen/Kconfig | 7 + drivers/xen/privcmd.c | 282 ++++++++++++++++++++++++++++++++++++- include/uapi/xen/privcmd.h | 14 ++ 3 files changed, 301 insertions(+), 2 deletions(-) diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig index d5d7c402b651..d43153fec18e 100644 --- a/drivers/xen/Kconfig +++ b/drivers/xen/Kconfig @@ -269,6 +269,13 @@ config XEN_PRIVCMD disaggregated Xen setups this driver might be needed for other domains, too. +config XEN_PRIVCMD_IRQFD + bool "Xen irqfd support" + depends on XEN_PRIVCMD && XEN_VIRTIO && EVENTFD + help + Using the irqfd mechanism a virtio backend running in a daemon can + speed up interrupt injection into a guest. + config XEN_ACPI_PROCESSOR tristate "Xen ACPI processor" depends on XEN && XEN_PV_DOM0 && X86 && ACPI_PROCESSOR && CPU_FREQ diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index e2f580e30a86..120af57999fc 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -9,11 +9,16 @@ #define pr_fmt(fmt) "xen:" KBUILD_MODNAME ": " fmt +#include +#include #include #include +#include +#include #include #include #include +#include #include #include #include @@ -833,6 +838,263 @@ static long privcmd_ioctl_mmap_resource(struct file *file, return rc; } +#ifdef CONFIG_XEN_PRIVCMD_IRQFD +/* Irqfd support */ +static struct workqueue_struct *irqfd_cleanup_wq; +static DEFINE_MUTEX(irqfds_lock); +static LIST_HEAD(irqfds_list); + +struct privcmd_kernel_irqfd { + struct xen_dm_op_buf xbufs; + domid_t dom; + bool error; + struct eventfd_ctx *eventfd; + struct work_struct shutdown; + wait_queue_entry_t wait; + struct list_head list; + poll_table pt; +}; + +static void irqfd_deactivate(struct privcmd_kernel_irqfd *kirqfd) +{ + lockdep_assert_held(&irqfds_lock); + + list_del_init(&kirqfd->list); + queue_work(irqfd_cleanup_wq, &kirqfd->shutdown); +} + +static void irqfd_shutdown(struct work_struct *work) +{ + struct privcmd_kernel_irqfd *kirqfd = + container_of(work, struct privcmd_kernel_irqfd, shutdown); + u64 cnt; + + eventfd_ctx_remove_wait_queue(kirqfd->eventfd, &kirqfd->wait, &cnt); + eventfd_ctx_put(kirqfd->eventfd); + kfree(kirqfd); +} + +static void irqfd_inject(struct privcmd_kernel_irqfd *kirqfd) +{ + u64 cnt; + long rc; + + eventfd_ctx_do_read(kirqfd->eventfd, &cnt); + + xen_preemptible_hcall_begin(); + rc = HYPERVISOR_dm_op(kirqfd->dom, 1, &kirqfd->xbufs); + xen_preemptible_hcall_end(); + + /* Don't repeat the error message for consecutive failures */ + if (rc && !kirqfd->error) { + pr_err("Failed to configure irq for guest domain: %d\n", + kirqfd->dom); + } + + kirqfd->error = rc; +} + +static int +irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode, int sync, void *key) +{ + struct privcmd_kernel_irqfd *kirqfd = + container_of(wait, struct privcmd_kernel_irqfd, wait); + __poll_t flags = key_to_poll(key); + + if (flags & EPOLLIN) + irqfd_inject(kirqfd); + + if (flags & EPOLLHUP) { + mutex_lock(&irqfds_lock); + irqfd_deactivate(kirqfd); + mutex_unlock(&irqfds_lock); + } + + return 0; +} + +static void +irqfd_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt) +{ + struct privcmd_kernel_irqfd *kirqfd = + container_of(pt, struct privcmd_kernel_irqfd, pt); + + add_wait_queue_priority(wqh, &kirqfd->wait); +} + +static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd) +{ + struct privcmd_kernel_irqfd *kirqfd, *tmp; + __poll_t events; + struct fd f; + void *dm_op; + int ret; + + kirqfd = kzalloc(sizeof(*kirqfd) + irqfd->size, GFP_KERNEL); + if (!kirqfd) + return -ENOMEM; + dm_op = kirqfd + 1; + + if (copy_from_user(dm_op, irqfd->dm_op, irqfd->size)) { + ret = -EFAULT; + goto error_kfree; + } + + kirqfd->xbufs.size = irqfd->size; + set_xen_guest_handle(kirqfd->xbufs.h, dm_op); + kirqfd->dom = irqfd->dom; + INIT_WORK(&kirqfd->shutdown, irqfd_shutdown); + + f = fdget(irqfd->fd); + if (!f.file) { + ret = -EBADF; + goto error_kfree; + } + + kirqfd->eventfd = eventfd_ctx_fileget(f.file); + if (IS_ERR(kirqfd->eventfd)) { + ret = PTR_ERR(kirqfd->eventfd); + goto error_fd_put; + } + + /* + * Install our own custom wake-up handling so we are notified via a + * callback whenever someone signals the underlying eventfd. + */ + init_waitqueue_func_entry(&kirqfd->wait, irqfd_wakeup); + init_poll_funcptr(&kirqfd->pt, irqfd_poll_func); + + mutex_lock(&irqfds_lock); + + list_for_each_entry(tmp, &irqfds_list, list) { + if (kirqfd->eventfd == tmp->eventfd) { + ret = -EBUSY; + mutex_unlock(&irqfds_lock); + goto error_eventfd; + } + } + + list_add_tail(&kirqfd->list, &irqfds_list); + mutex_unlock(&irqfds_lock); + + /* + * Check if there was an event already pending on the eventfd before we + * registered, and trigger it as if we didn't miss it. + */ + events = vfs_poll(f.file, &kirqfd->pt); + if (events & EPOLLIN) + irqfd_inject(kirqfd); + + /* + * Do not drop the file until the kirqfd is fully initialized, otherwise + * we might race against the EPOLLHUP. + */ + fdput(f); + return 0; + +error_eventfd: + eventfd_ctx_put(kirqfd->eventfd); + +error_fd_put: + fdput(f); + +error_kfree: + kfree(kirqfd); + return ret; +} + +static int privcmd_irqfd_deassign(struct privcmd_irqfd *irqfd) +{ + struct privcmd_kernel_irqfd *kirqfd; + struct eventfd_ctx *eventfd; + + eventfd = eventfd_ctx_fdget(irqfd->fd); + if (IS_ERR(eventfd)) + return PTR_ERR(eventfd); + + mutex_lock(&irqfds_lock); + + list_for_each_entry(kirqfd, &irqfds_list, list) { + if (kirqfd->eventfd == eventfd) { + irqfd_deactivate(kirqfd); + break; + } + } + + mutex_unlock(&irqfds_lock); + + eventfd_ctx_put(eventfd); + + /* + * Block until we know all outstanding shutdown jobs have completed so + * that we guarantee there will not be any more interrupts once this + * deassign function returns. + */ + flush_workqueue(irqfd_cleanup_wq); + + return 0; +} + +static long privcmd_ioctl_irqfd(struct file *file, void __user *udata) +{ + struct privcmd_data *data = file->private_data; + struct privcmd_irqfd irqfd; + + if (copy_from_user(&irqfd, udata, sizeof(irqfd))) + return -EFAULT; + + /* No other flags should be set */ + if (irqfd.flags & ~PRIVCMD_IRQFD_FLAG_DEASSIGN) + return -EINVAL; + + /* If restriction is in place, check the domid matches */ + if (data->domid != DOMID_INVALID && data->domid != irqfd.dom) + return -EPERM; + + if (irqfd.flags & PRIVCMD_IRQFD_FLAG_DEASSIGN) + return privcmd_irqfd_deassign(&irqfd); + + return privcmd_irqfd_assign(&irqfd); +} + +static int privcmd_irqfd_init(void) +{ + irqfd_cleanup_wq = alloc_workqueue("privcmd-irqfd-cleanup", 0, 0); + if (!irqfd_cleanup_wq) + return -ENOMEM; + + return 0; +} + +static void privcmd_irqfd_exit(void) +{ + struct privcmd_kernel_irqfd *kirqfd, *tmp; + + mutex_lock(&irqfds_lock); + + list_for_each_entry_safe(kirqfd, tmp, &irqfds_list, list) + irqfd_deactivate(kirqfd); + + mutex_unlock(&irqfds_lock); + + destroy_workqueue(irqfd_cleanup_wq); +} +#else +static inline long privcmd_ioctl_irqfd(struct file *file, void __user *udata) +{ + return -EOPNOTSUPP; +} + +static inline int privcmd_irqfd_init(void) +{ + return 0; +} + +static inline void privcmd_irqfd_exit(void) +{ +} +#endif /* CONFIG_XEN_PRIVCMD_IRQFD */ + static long privcmd_ioctl(struct file *file, unsigned int cmd, unsigned long data) { @@ -868,6 +1130,10 @@ static long privcmd_ioctl(struct file *file, ret = privcmd_ioctl_mmap_resource(file, udata); break; + case IOCTL_PRIVCMD_IRQFD: + ret = privcmd_ioctl_irqfd(file, udata); + break; + default: break; } @@ -992,15 +1258,27 @@ static int __init privcmd_init(void) err = misc_register(&xen_privcmdbuf_dev); if (err != 0) { pr_err("Could not register Xen hypercall-buf device\n"); - misc_deregister(&privcmd_dev); - return err; + goto err_privcmdbuf; + } + + err = privcmd_irqfd_init(); + if (err != 0) { + pr_err("irqfd init failed\n"); + goto err_irqfd; } return 0; + +err_irqfd: + misc_deregister(&xen_privcmdbuf_dev); +err_privcmdbuf: + misc_deregister(&privcmd_dev); + return err; } static void __exit privcmd_exit(void) { + privcmd_irqfd_exit(); misc_deregister(&privcmd_dev); misc_deregister(&xen_privcmdbuf_dev); } diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h index d2029556083e..375718ba4ab6 100644 --- a/include/uapi/xen/privcmd.h +++ b/include/uapi/xen/privcmd.h @@ -98,6 +98,18 @@ struct privcmd_mmap_resource { __u64 addr; }; +/* For privcmd_irqfd::flags */ +#define PRIVCMD_IRQFD_FLAG_DEASSIGN (1 << 0) + +struct privcmd_irqfd { + void __user *dm_op; + __u32 size; /* Size of structure pointed by dm_op */ + __u32 fd; + __u32 flags; + domid_t dom; + __u8 pad[2]; +}; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -125,5 +137,7 @@ struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE \ _IOC(_IOC_NONE, 'P', 7, sizeof(struct privcmd_mmap_resource)) +#define IOCTL_PRIVCMD_IRQFD \ + _IOC(_IOC_NONE, 'P', 8, sizeof(struct privcmd_irqfd)) #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */