From patchwork Fri Feb 12 19:27:39 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12085977 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3754CC433E0 for ; Fri, 12 Feb 2021 19:29:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E661864DEC for ; Fri, 12 Feb 2021 19:29:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231977AbhBLT3V (ORCPT ); Fri, 12 Feb 2021 14:29:21 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:54741 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230390AbhBLT3Q (ORCPT ); Fri, 12 Feb 2021 14:29:16 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613158070; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+QyjDl1YGTxR6RiVnlw4Q0RJP2vN/bwyFH6oNtXuIi0=; b=cCKe11xdX7F/1f+a5TwW6iXpk1UzfkRKtNkiFcohzz82mifwnCMvpSIZeMXJH3uaBX4poi bb205QP1P8cDFVXEghejLGxQ2EQTml1imwR8TLOXV+VCeQWrbSuPLWz+JusSdsREBgox9c TnBv9OoKlliFIsmmG5mcEfXu8epGj5Q= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-96-GIBOazs_MQu_WbG5UIIbeA-1; Fri, 12 Feb 2021 14:27:47 -0500 X-MC-Unique: GIBOazs_MQu_WbG5UIIbeA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id EB2E1C7403; Fri, 12 Feb 2021 19:27:45 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9A95360BE5; Fri, 12 Feb 2021 19:27:39 +0000 (UTC) Subject: [PATCH 1/3] vfio: Introduce vma ops registration and notifier From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Fri, 12 Feb 2021 12:27:39 -0700 Message-ID: <161315805248.7320.13358719859656681660.stgit@gimli.home> In-Reply-To: <161315658638.7320.9686203003395567745.stgit@gimli.home> References: <161315658638.7320.9686203003395567745.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Create an interface through vfio-core where a vfio bus driver (ex. vfio-pci) can register the vm_operations_struct it uses to map device memory, along with a set of registration callbacks. This allows vfio-core to expose interfaces for IOMMU backends to match a vm_area_struct to a bus driver and register a notifier for relavant changes to the device mapping. For now we define only a notifier action for closing the device. Signed-off-by: Alex Williamson --- drivers/vfio/vfio.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/vfio.h | 20 ++++++++ 2 files changed, 140 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 38779e6fd80c..568f5e37a95f 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -47,6 +47,8 @@ static struct vfio { struct cdev group_cdev; dev_t group_devt; wait_queue_head_t release_q; + struct list_head vm_ops_list; + struct mutex vm_ops_lock; } vfio; struct vfio_iommu_driver { @@ -2354,6 +2356,121 @@ struct iommu_domain *vfio_group_iommu_domain(struct vfio_group *group) } EXPORT_SYMBOL_GPL(vfio_group_iommu_domain); +struct vfio_vma_ops { + const struct vm_operations_struct *vm_ops; + vfio_register_vma_nb_t *reg_fn; + vfio_unregister_vma_nb_t *unreg_fn; + struct list_head next; +}; + +int vfio_register_vma_ops(const struct vm_operations_struct *vm_ops, + vfio_register_vma_nb_t *reg_fn, + vfio_unregister_vma_nb_t *unreg_fn) +{ + struct vfio_vma_ops *ops; + int ret = 0; + + mutex_lock(&vfio.vm_ops_lock); + list_for_each_entry(ops, &vfio.vm_ops_list, next) { + if (ops->vm_ops == vm_ops) { + ret = -EEXIST; + goto out; + } + } + + ops = kmalloc(sizeof(*ops), GFP_KERNEL); + if (!ops) { + ret = -ENOMEM; + goto out; + } + + ops->vm_ops = vm_ops; + ops->reg_fn = reg_fn; + ops->unreg_fn = unreg_fn; + + list_add(&ops->next, &vfio.vm_ops_list); +out: + mutex_unlock(&vfio.vm_ops_lock); + return ret; + +} +EXPORT_SYMBOL_GPL(vfio_register_vma_ops); + +void vfio_unregister_vma_ops(const struct vm_operations_struct *vm_ops) +{ + struct vfio_vma_ops *ops; + + mutex_lock(&vfio.vm_ops_lock); + list_for_each_entry(ops, &vfio.vm_ops_list, next) { + if (ops->vm_ops == vm_ops) { + list_del(&ops->next); + kfree(ops); + break; + } + } + mutex_unlock(&vfio.vm_ops_lock); +} +EXPORT_SYMBOL_GPL(vfio_unregister_vma_ops); + +struct vfio_vma_obj { + const struct vm_operations_struct *vm_ops; + void *opaque; +}; + +void *vfio_register_vma_nb(struct vm_area_struct *vma, + struct notifier_block *nb) +{ + struct vfio_vma_ops *ops; + struct vfio_vma_obj *obj = ERR_PTR(-ENODEV); + + mutex_lock(&vfio.vm_ops_lock); + list_for_each_entry(ops, &vfio.vm_ops_list, next) { + if (ops->vm_ops == vma->vm_ops) { + void *opaque; + + obj = kmalloc(sizeof(*obj), GFP_KERNEL); + if (!obj) { + obj = ERR_PTR(-ENOMEM); + break; + } + + obj->vm_ops = ops->vm_ops; + + opaque = ops->reg_fn(vma, nb); + if (IS_ERR(opaque)) { + kfree(obj); + obj = opaque; + } else { + obj->opaque = opaque; + } + + break; + } + } + mutex_unlock(&vfio.vm_ops_lock); + + return obj; +} +EXPORT_SYMBOL_GPL(vfio_register_vma_nb); + +void vfio_unregister_vma_nb(void *opaque) +{ + struct vfio_vma_obj *obj = opaque; + struct vfio_vma_ops *ops; + + mutex_lock(&vfio.vm_ops_lock); + list_for_each_entry(ops, &vfio.vm_ops_list, next) { + if (ops->vm_ops == obj->vm_ops) { + ops->unreg_fn(obj->opaque); + break; + } + } + mutex_unlock(&vfio.vm_ops_lock); + + kfree(obj); +} +EXPORT_SYMBOL_GPL(vfio_unregister_vma_nb); + /** * Module/class support */ @@ -2377,8 +2494,10 @@ static int __init vfio_init(void) idr_init(&vfio.group_idr); mutex_init(&vfio.group_lock); mutex_init(&vfio.iommu_drivers_lock); + mutex_init(&vfio.vm_ops_lock); INIT_LIST_HEAD(&vfio.group_list); INIT_LIST_HEAD(&vfio.iommu_drivers_list); + INIT_LIST_HEAD(&vfio.vm_ops_list); init_waitqueue_head(&vfio.release_q); ret = misc_register(&vfio_dev); @@ -2425,6 +2544,7 @@ static int __init vfio_init(void) static void __exit vfio_cleanup(void) { WARN_ON(!list_empty(&vfio.group_list)); + WARN_ON(!list_empty(&vfio.vm_ops_list)); #ifdef CONFIG_VFIO_NOIOMMU vfio_unregister_iommu_driver(&vfio_noiommu_ops); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index b7e18bde5aa8..1b5c6179d869 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -137,6 +137,26 @@ extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova, extern struct iommu_domain *vfio_group_iommu_domain(struct vfio_group *group); +typedef void *(vfio_register_vma_nb_t)(struct vm_area_struct *vma, + struct notifier_block *nb); +typedef void (vfio_unregister_vma_nb_t)(void *opaque); + +extern int vfio_register_vma_ops(const struct vm_operations_struct *vm_ops, + vfio_register_vma_nb_t *reg_fn, + vfio_unregister_vma_nb_t *unreg_fn); + +extern void vfio_unregister_vma_ops(const struct vm_operations_struct *vm_ops); + +enum vfio_vma_notify_type { + VFIO_VMA_NOTIFY_CLOSE = 0, +}; + +extern void *vfio_register_vma_nb(struct vm_area_struct *vma, + struct notifier_block *nb); + +extern void vfio_unregister_vma_nb(void *opaque); + + /* each type has independent events */ enum vfio_notify_type { VFIO_IOMMU_NOTIFY = 0, From patchwork Fri Feb 12 19:27:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12085979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6D6EDC433E0 for ; Fri, 12 Feb 2021 19:29:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 4353C64DEC for ; Fri, 12 Feb 2021 19:29:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232052AbhBLT3m (ORCPT ); Fri, 12 Feb 2021 14:29:42 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:52449 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232025AbhBLT30 (ORCPT ); Fri, 12 Feb 2021 14:29:26 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613158079; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tHqPKCx9QpsLepzPpPyRSDHkZRYjZbHiZsF2Y6qelmQ=; b=EPK9fHuMGTjUTp7jUVHrjR1qaddRNxTbg0COHsKJzLg38slRD+7YrBhrPYifMJ7JimHNCL KyQb8TeitorpCFpXOdN/s2/HWI8dDrgh6EXk51LKYVwi0lpBg4t9miYp3P9S8DWSZ1d0qH 3o3Lb4q8dUjQCDHLoCcPJYrNXUp2dcY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-375-Ws3A7GDeOSaHSXtJczLUCA-1; Fri, 12 Feb 2021 14:27:57 -0500 X-MC-Unique: Ws3A7GDeOSaHSXtJczLUCA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 5D5E3107ACE6; Fri, 12 Feb 2021 19:27:56 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id E60A760BE5; Fri, 12 Feb 2021 19:27:55 +0000 (UTC) Subject: [PATCH 2/3] vfio/pci: Implement vm_ops registration From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Fri, 12 Feb 2021 12:27:55 -0700 Message-ID: <161315807103.7320.16122193489358171384.stgit@gimli.home> In-Reply-To: <161315658638.7320.9686203003395567745.stgit@gimli.home> References: <161315658638.7320.9686203003395567745.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org The vfio-pci vfio bus driver implements a vm_operations_struct for managing mmaps to device BARs, therefore given a vma with matching vm_ops we can create a reference using the existing vfio external user interfaces and register the provided notifier to receive callbacks relative to the device. The close notifier is implemented for when the device is released, rather than closing the vma to avoid possibly breaking userspace (ie. mmap -> dma map -> munmap is currently allowed and maintains the dma mapping to the device). Signed-off-by: Alex Williamson --- drivers/vfio/pci/Kconfig | 1 drivers/vfio/pci/vfio_pci.c | 87 +++++++++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_private.h | 1 3 files changed, 89 insertions(+) diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 40a223381ab6..4e3059d6206c 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -4,6 +4,7 @@ config VFIO_PCI depends on VFIO && PCI && EVENTFD select VFIO_VIRQFD select IRQ_BYPASS_MANAGER + select SRCU help Support for the PCI VFIO bus driver. This is required to make use of PCI drivers using the VFIO framework. diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 706de3ef94bb..dcbdaece80f8 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -560,6 +560,8 @@ static void vfio_pci_release(void *device_data) mutex_lock(&vdev->reflck->lock); if (!(--vdev->refcnt)) { + srcu_notifier_call_chain(&vdev->vma_notifier, + VFIO_VMA_NOTIFY_CLOSE, NULL); vfio_pci_vf_token_user_add(vdev, -1); vfio_spapr_pci_eeh_release(vdev->pdev); vfio_pci_disable(vdev); @@ -1969,6 +1971,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) mutex_init(&vdev->vma_lock); INIT_LIST_HEAD(&vdev->vma_list); init_rwsem(&vdev->memory_lock); + srcu_init_notifier_head(&vdev->vma_notifier); ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); if (ret) @@ -2362,6 +2365,7 @@ static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev) static void __exit vfio_pci_cleanup(void) { + vfio_unregister_vma_ops(&vfio_pci_mmap_ops); pci_unregister_driver(&vfio_pci_driver); vfio_pci_uninit_perm_bits(); } @@ -2407,6 +2411,81 @@ static void __init vfio_pci_fill_ids(void) } } +struct vfio_pci_vma_obj { + struct vfio_pci_device *vdev; + struct vfio_group *group; + struct vfio_device *device; + struct notifier_block *nb; +}; + +static void *vfio_pci_register_vma_notifier(struct vm_area_struct *vma, + struct notifier_block *nb) +{ + struct vfio_pci_device *vdev = vma->vm_private_data; + struct vfio_pci_vma_obj *obj; + struct vfio_group *group; + struct vfio_device *device; + int ret; + + if (!vdev || vma->vm_ops != &vfio_pci_mmap_ops) + return ERR_PTR(-EINVAL); + + obj = kmalloc(sizeof(*obj), GFP_KERNEL); + if (!obj) + return ERR_PTR(-ENOMEM); + + /* + * Get a group and container reference, this prevents the container + * from being torn down while this vma is mapped, ie. device stays + * isolated. + * + * NB. The container must be torn down on device close without + * explicit unmaps, therefore we must notify on close. + */ + group = vfio_group_get_external_user_from_dev(&vdev->pdev->dev); + if (IS_ERR(group)) { + kfree(obj); + return group; + } + + /* Also need device reference to prevent unbind */ + device = vfio_device_get_from_dev(&vdev->pdev->dev); + if (IS_ERR(device)) { + vfio_group_put_external_user(group); + kfree(obj); + return device; + } + + /* + * Use the srcu notifier chain variant to avoid AB-BA locking issues + * with the caller, ex. iommu->lock vs nh->rwsem + */ + ret = srcu_notifier_chain_register(&vdev->vma_notifier, nb); + if (ret) { + vfio_device_put(device); + vfio_group_put_external_user(group); + kfree(obj); + return ERR_PTR(ret); + } + + obj->vdev = vdev; + obj->group = group; + obj->device = device; + obj->nb = nb; + + return obj; +} + +static void vfio_pci_unregister_vma_notifier(void *opaque) +{ + struct vfio_pci_vma_obj *obj = opaque; + + srcu_notifier_chain_unregister(&obj->vdev->vma_notifier, obj->nb); + vfio_device_put(obj->device); + vfio_group_put_external_user(obj->group); + kfree(obj); +} + static int __init vfio_pci_init(void) { int ret; @@ -2421,6 +2500,12 @@ static int __init vfio_pci_init(void) if (ret) goto out_driver; + ret = vfio_register_vma_ops(&vfio_pci_mmap_ops, + vfio_pci_register_vma_notifier, + vfio_pci_unregister_vma_notifier); + if (ret) + goto out_vma; + vfio_pci_fill_ids(); if (disable_denylist) @@ -2428,6 +2513,8 @@ static int __init vfio_pci_init(void) return 0; +out_vma: + pci_unregister_driver(&vfio_pci_driver); out_driver: vfio_pci_uninit_perm_bits(); return ret; diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 5c90e560c5c7..12c61d099d1a 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -142,6 +142,7 @@ struct vfio_pci_device { struct mutex vma_lock; struct list_head vma_list; struct rw_semaphore memory_lock; + struct srcu_notifier_head vma_notifier; }; #define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX) From patchwork Fri Feb 12 19:28:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12085981 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3ECFC433E0 for ; Fri, 12 Feb 2021 19:29:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 974FF64E34 for ; Fri, 12 Feb 2021 19:29:52 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232086AbhBLT3v (ORCPT ); Fri, 12 Feb 2021 14:29:51 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:51462 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232065AbhBLT3q (ORCPT ); Fri, 12 Feb 2021 14:29:46 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613158099; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1YuTGZbTwR1WBlvZwFuKyi6rNVLUvL5QcgrEv/9sJpk=; b=g/j5SixqpOcgnpPvNWHXN0w1sHOSmV7I9JbvwsgHSIoQYVxJ0rq28G/OpR2MoexRUVI89n QZR1K4LeFua7AlMyFIRW4CSAAtHgU3Y75S5LzUvlnwmydbBZ9RQUVnpV/PaUBGOOm5y8PJ iLQ/eFtDjzdE5hR7TZAFtveLCrmXtY0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-298-WTcre85hMlWkVuAV9yg9YQ-1; Fri, 12 Feb 2021 14:28:17 -0500 X-MC-Unique: WTcre85hMlWkVuAV9yg9YQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 77BFD801962; Fri, 12 Feb 2021 19:28:16 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id F17295D9DB; Fri, 12 Feb 2021 19:28:09 +0000 (UTC) Subject: [PATCH 3/3] vfio/type1: Implement vma registration and restriction From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Fri, 12 Feb 2021 12:28:09 -0700 Message-ID: <161315808144.7320.2973346461158505515.stgit@gimli.home> In-Reply-To: <161315658638.7320.9686203003395567745.stgit@gimli.home> References: <161315658638.7320.9686203003395567745.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Use the new vma registration interface to configure a notifier for DMA mappings to device memory. On close notification, remove the mapping. This allows us to implement a new default policy to restrict PFNMAP mappings to only those vmas whose vm_ops is registered with vfio-core to provide this support. A new module option is provided to opt-out should this conflict with existing use cases. Signed-off-by: Alex Williamson Reported-by: kernel test robot --- drivers/vfio/vfio_iommu_type1.c | 192 +++++++++++++++++++++++++++++++-------- 1 file changed, 155 insertions(+), 37 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 90715413c3d9..137aab2a00fd 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -62,6 +62,11 @@ module_param_named(dma_entry_limit, dma_entry_limit, uint, 0644); MODULE_PARM_DESC(dma_entry_limit, "Maximum number of user DMA mappings per container (65535)."); +static bool strict_mmio_maps = true; +module_param_named(strict_mmio_maps, strict_mmio_maps, bool, 0644); +MODULE_PARM_DESC(strict_mmio_maps, + "Restrict to safe DMA mappings of device memory (true)."); + struct vfio_iommu { struct list_head domain_list; struct list_head iova_list; @@ -89,6 +94,13 @@ struct vfio_domain { bool fgsp; /* Fine-grained super pages */ }; +struct pfnmap_obj { + struct notifier_block nb; + struct work_struct work; + struct vfio_iommu *iommu; + void *opaque; +}; + struct vfio_dma { struct rb_node node; dma_addr_t iova; /* Device address */ @@ -101,6 +113,7 @@ struct vfio_dma { struct task_struct *task; struct rb_root pfn_list; /* Ex-user pinned pfn list */ unsigned long *bitmap; + struct pfnmap_obj *pfnmap; }; struct vfio_group { @@ -495,15 +508,108 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm, return ret; } -static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, - int prot, unsigned long *pfn) +static void unregister_vma_bg(struct work_struct *work) +{ + struct pfnmap_obj *pfnmap = container_of(work, struct pfnmap_obj, work); + + vfio_unregister_vma_nb(pfnmap->opaque); + kfree(pfnmap); +} + +struct vfio_dma *pfnmap_find_dma(struct pfnmap_obj *pfnmap) +{ + struct rb_node *n; + + for (n = rb_first(&pfnmap->iommu->dma_list); n; n = rb_next(n)) { + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); + + if (dma->pfnmap == pfnmap) + return dma; + } + + return NULL; +} + +/* Return 1 if iommu->lock dropped and notified, 0 if done */ +static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma, + struct vfio_dma **dma_last, int *retries) +{ + if (!RB_EMPTY_ROOT(&dma->pfn_list)) { + struct vfio_iommu_type1_dma_unmap nb_unmap; + + if (*dma_last == dma) { + BUG_ON(++(*retries) > 10); + } else { + *dma_last = dma; + *retries = 0; + } + + nb_unmap.iova = dma->iova; + nb_unmap.size = dma->size; + + /* + * Notify anyone (mdev vendor drivers) to invalidate and + * unmap iovas within the range we're about to unmap. + * Vendor drivers MUST unpin pages in response to an + * invalidation. + */ + mutex_unlock(&iommu->lock); + blocking_notifier_call_chain(&iommu->notifier, + VFIO_IOMMU_NOTIFY_DMA_UNMAP, + &nb_unmap); + return 1; + } + + return 0; +} + +static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma); + +static int vfio_vma_nb_cb(struct notifier_block *nb, + unsigned long action, void *unused) +{ + struct pfnmap_obj *pfnmap = container_of(nb, struct pfnmap_obj, nb); + + switch (action) { + case VFIO_VMA_NOTIFY_CLOSE: + { + struct vfio_dma *dma, *dma_last = NULL; + int retries = 0; + +again: + mutex_lock(&pfnmap->iommu->lock); + dma = pfnmap_find_dma(pfnmap); + if (dma) { + if (unmap_dma_pfn_list(pfnmap->iommu, dma, + &dma_last, &retries)) + goto again; + + dma->pfnmap = NULL; + vfio_remove_dma(pfnmap->iommu, dma); + } + mutex_unlock(&pfnmap->iommu->lock); + + /* Cannot unregister notifier from callback chain */ + INIT_WORK(&pfnmap->work, unregister_vma_bg); + schedule_work(&pfnmap->work); + + break; + } + } + + return NOTIFY_OK; +} + +static int vaddr_get_pfn(struct vfio_iommu *iommu, struct vfio_dma *dma, + struct mm_struct *mm, unsigned long vaddr, + unsigned long *pfn) { struct page *page[1]; struct vm_area_struct *vma; unsigned int flags = 0; int ret; - if (prot & IOMMU_WRITE) + if (dma->prot & IOMMU_WRITE) flags |= FOLL_WRITE; mmap_read_lock(mm); @@ -521,12 +627,40 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, vma = find_vma_intersection(mm, vaddr, vaddr + 1); if (vma && vma->vm_flags & VM_PFNMAP) { - ret = follow_fault_pfn(vma, mm, vaddr, pfn, prot & IOMMU_WRITE); + ret = follow_fault_pfn(vma, mm, vaddr, pfn, + dma->prot & IOMMU_WRITE); if (ret == -EAGAIN) goto retry; - if (!ret && !is_invalid_reserved_pfn(*pfn)) + if (!ret && !is_invalid_reserved_pfn(*pfn)) { ret = -EFAULT; + goto done; + } + + if (!dma->pfnmap) { + struct pfnmap_obj *pfnmap; + void *opaque; + + pfnmap = kzalloc(sizeof(*pfnmap), GFP_KERNEL); + if (!pfnmap) { + ret = -ENOMEM; + goto done; + } + + pfnmap->iommu = iommu; + pfnmap->nb.notifier_call = vfio_vma_nb_cb; + opaque = vfio_register_vma_nb(vma, &pfnmap->nb); + if (IS_ERR(opaque)) { + kfree(pfnmap); + if (strict_mmio_maps) { + ret = PTR_ERR(opaque); + goto done; + } + } else { + pfnmap->opaque = opaque; + dma->pfnmap = pfnmap; + } + } } done: mmap_read_unlock(mm); @@ -593,7 +727,8 @@ static int vfio_wait_all_valid(struct vfio_iommu *iommu) * the iommu can only map chunks of consecutive pfns anyway, so get the * first page and all consecutive pages with the same locking. */ -static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, +static long vfio_pin_pages_remote(struct vfio_iommu *iommu, + struct vfio_dma *dma, unsigned long vaddr, long npage, unsigned long *pfn_base, unsigned long limit) { @@ -606,7 +741,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, if (!current->mm) return -ENODEV; - ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, pfn_base); + ret = vaddr_get_pfn(iommu, dma, current->mm, vaddr, pfn_base); if (ret) return ret; @@ -633,7 +768,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, /* Lock all the consecutive pages from pfn_base */ for (vaddr += PAGE_SIZE, iova += PAGE_SIZE; pinned < npage; pinned++, vaddr += PAGE_SIZE, iova += PAGE_SIZE) { - ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, &pfn); + ret = vaddr_get_pfn(iommu, dma, current->mm, vaddr, &pfn); if (ret) break; @@ -693,7 +828,8 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova, return unlocked; } -static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, +static int vfio_pin_page_external(struct vfio_iommu *iommu, + struct vfio_dma *dma, unsigned long vaddr, unsigned long *pfn_base, bool do_accounting) { struct mm_struct *mm; @@ -703,7 +839,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, if (!mm) return -ENODEV; - ret = vaddr_get_pfn(mm, vaddr, dma->prot, pfn_base); + ret = vaddr_get_pfn(iommu, dma, mm, vaddr, pfn_base); if (!ret && do_accounting && !is_invalid_reserved_pfn(*pfn_base)) { ret = vfio_lock_acct(dma, 1, true); if (ret) { @@ -811,8 +947,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, } remote_vaddr = dma->vaddr + (iova - dma->iova); - ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i], - do_accounting); + ret = vfio_pin_page_external(iommu, dma, remote_vaddr, + &phys_pfn[i], do_accounting); if (ret) goto pin_unwind; @@ -1071,6 +1207,10 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) { WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)); + if (dma->pfnmap) { + vfio_unregister_vma_nb(dma->pfnmap->opaque); + kfree(dma->pfnmap); + } vfio_unmap_unpin(iommu, dma, true); vfio_unlink_dma(iommu, dma); put_task_struct(dma->task); @@ -1318,29 +1458,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, continue; } - if (!RB_EMPTY_ROOT(&dma->pfn_list)) { - struct vfio_iommu_type1_dma_unmap nb_unmap; - - if (dma_last == dma) { - BUG_ON(++retries > 10); - } else { - dma_last = dma; - retries = 0; - } - - nb_unmap.iova = dma->iova; - nb_unmap.size = dma->size; - - /* - * Notify anyone (mdev vendor drivers) to invalidate and - * unmap iovas within the range we're about to unmap. - * Vendor drivers MUST unpin pages in response to an - * invalidation. - */ - mutex_unlock(&iommu->lock); - blocking_notifier_call_chain(&iommu->notifier, - VFIO_IOMMU_NOTIFY_DMA_UNMAP, - &nb_unmap); + if (unmap_dma_pfn_list(iommu, dma, &dma_last, &retries)) { mutex_lock(&iommu->lock); goto again; } @@ -1404,7 +1522,7 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, while (size) { /* Pin a contiguous chunk of memory */ - npage = vfio_pin_pages_remote(dma, vaddr + dma->size, + npage = vfio_pin_pages_remote(iommu, dma, vaddr + dma->size, size >> PAGE_SHIFT, &pfn, limit); if (npage <= 0) { WARN_ON(!npage); @@ -1660,7 +1778,7 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu, size_t n = dma->iova + dma->size - iova; long npage; - npage = vfio_pin_pages_remote(dma, vaddr, + npage = vfio_pin_pages_remote(iommu, dma, vaddr, n >> PAGE_SHIFT, &pfn, limit); if (npage <= 0) {