From patchwork Mon Feb 22 16:50:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12099173 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12E4EC433E0 for ; Mon, 22 Feb 2021 16:52:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CBD4764F04 for ; Mon, 22 Feb 2021 16:52:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231312AbhBVQw0 (ORCPT ); Mon, 22 Feb 2021 11:52:26 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:43337 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231261AbhBVQwO (ORCPT ); Mon, 22 Feb 2021 11:52:14 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614012646; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QBaujFxExGLh2CQgduXZKM8P6MRyUxKv108wObRTJvA=; b=CJ5By3aAylgMLnmKRDXakaljJ8yE2BtD9PNl+UMdN+KhB77Xw9J6q50RySo1KKBat1vPbr Sb0aOlcAFhGFt1Mm2GpOlE5+7HL2QqiCGOo9JfhrPZjQY1deFJzvPHf60HqK0BW/ins1FL c+Wrouf3GVyeAXdpxuKbZZhStRSELJA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-366-40sXcic-NSuS7zqAYZdLNg-1; Mon, 22 Feb 2021 11:50:43 -0500 X-MC-Unique: 40sXcic-NSuS7zqAYZdLNg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 45AE9CC620; Mon, 22 Feb 2021 16:50:42 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id 877C25C1BD; Mon, 22 Feb 2021 16:50:35 +0000 (UTC) Subject: [RFC PATCH 01/10] vfio: Create vfio_fs_type with inode per device From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 22 Feb 2021 09:50:35 -0700 Message-ID: <161401263517.16443.7534035240372538844.stgit@gimli.home> In-Reply-To: <161401167013.16443.8389863523766611711.stgit@gimli.home> References: <161401167013.16443.8389863523766611711.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org By linking all the device fds we provide to userspace to an address space through a new pseudo fs, we can use tools like unmap_mapping_range() to zap all vmas associated with a device. Suggested-by: Jason Gunthorpe Signed-off-by: Alex Williamson --- drivers/vfio/vfio.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 38779e6fd80c..464caef97aff 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -32,11 +32,18 @@ #include #include #include +#include +#include #define DRIVER_VERSION "0.3" #define DRIVER_AUTHOR "Alex Williamson " #define DRIVER_DESC "VFIO - User Level meta-driver" +#define VFIO_MAGIC 0x5646494f /* "VFIO" */ + +static int vfio_fs_cnt; +static struct vfsmount *vfio_fs_mnt; + static struct vfio { struct class *class; struct list_head iommu_drivers_list; @@ -97,6 +104,7 @@ struct vfio_device { struct vfio_group *group; struct list_head group_next; void *device_data; + struct inode *inode; }; #ifdef CONFIG_VFIO_NOIOMMU @@ -529,6 +537,34 @@ static struct vfio_group *vfio_group_get_from_dev(struct device *dev) return group; } +static int vfio_fs_init_fs_context(struct fs_context *fc) +{ + return init_pseudo(fc, VFIO_MAGIC) ? 0 : -ENOMEM; +} + +static struct file_system_type vfio_fs_type = { + .name = "vfio", + .owner = THIS_MODULE, + .init_fs_context = vfio_fs_init_fs_context, + .kill_sb = kill_anon_super, +}; + +static struct inode *vfio_fs_inode_new(void) +{ + struct inode *inode; + int ret; + + ret = simple_pin_fs(&vfio_fs_type, &vfio_fs_mnt, &vfio_fs_cnt); + if (ret) + return ERR_PTR(ret); + + inode = alloc_anon_inode(vfio_fs_mnt->mnt_sb); + if (IS_ERR(inode)) + simple_release_fs(&vfio_fs_mnt, &vfio_fs_cnt); + + return inode; +} + /** * Device objects - create, release, get, put, search */ @@ -539,11 +575,19 @@ struct vfio_device *vfio_group_create_device(struct vfio_group *group, void *device_data) { struct vfio_device *device; + struct inode *inode; device = kzalloc(sizeof(*device), GFP_KERNEL); if (!device) return ERR_PTR(-ENOMEM); + inode = vfio_fs_inode_new(); + if (IS_ERR(inode)) { + kfree(device); + return (struct vfio_device *)inode; + } + device->inode = inode; + kref_init(&device->kref); device->dev = dev; device->group = group; @@ -574,6 +618,9 @@ static void vfio_device_release(struct kref *kref) dev_set_drvdata(device->dev, NULL); + iput(device->inode); + simple_release_fs(&vfio_fs_mnt, &vfio_fs_cnt); + kfree(device); /* vfio_del_group_dev may be waiting for this device */ @@ -1488,6 +1535,13 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf) */ filep->f_mode |= (FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE); + /* + * Use the pseudo fs inode on the device to link all mmaps + * to the same address space, allowing us to unmap all vmas + * associated to this device using unmap_mapping_range(). + */ + filep->f_mapping = device->inode->i_mapping; + atomic_inc(&group->container_users); fd_install(ret, filep); From patchwork Mon Feb 22 16:50:47 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12099175 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1864BC433E0 for ; Mon, 22 Feb 2021 16:52:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D581C64F04 for ; Mon, 22 Feb 2021 16:52:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231258AbhBVQwj (ORCPT ); Mon, 22 Feb 2021 11:52:39 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:59050 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231292AbhBVQwZ (ORCPT ); Mon, 22 Feb 2021 11:52:25 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614012658; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3d+4akFeShefY1mDCC6qtlOmhNGy6H9oh3THeYy9xOs=; b=UaBeGAP4EmLflcrj3Vp9QOS56MMHg4tuXG4YTrADpiXtgHnSddPQFlSdwRZFQSlB1JOfm/ AGC//hlM69Fv/AYVNdymNC9gbHmvZIMK01KhRIRMerBwUm68pBd59CcBhvNB6nafctVgxn ZEenjO4FLo8BcmSvqjKPRi/aU8DwWII= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-344-wIHTWIsqOzuklEt-q2uO8g-1; Mon, 22 Feb 2021 11:50:55 -0500 X-MC-Unique: wIHTWIsqOzuklEt-q2uO8g-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 34D45CC621; Mon, 22 Feb 2021 16:50:54 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id BB1225D71D; Mon, 22 Feb 2021 16:50:47 +0000 (UTC) Subject: [RFC PATCH 02/10] vfio: Update vfio_add_group_dev() API From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 22 Feb 2021 09:50:47 -0700 Message-ID: <161401264735.16443.5908636631567017543.stgit@gimli.home> In-Reply-To: <161401167013.16443.8389863523766611711.stgit@gimli.home> References: <161401167013.16443.8389863523766611711.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Rather than an errno, return a pointer to the opaque vfio_device to allow the bus driver to call into vfio-core without additional lookups and references. Note that bus drivers are still required to use vfio_del_group_dev() to teardown the vfio_device. Signed-off-by: Alex Williamson --- drivers/vfio/fsl-mc/vfio_fsl_mc.c | 6 ++++-- drivers/vfio/mdev/vfio_mdev.c | 5 ++++- drivers/vfio/pci/vfio_pci.c | 7 +++++-- drivers/vfio/platform/vfio_platform_common.c | 7 +++++-- drivers/vfio/vfio.c | 23 ++++++++++------------- include/linux/vfio.h | 6 +++--- 6 files changed, 31 insertions(+), 23 deletions(-) diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c index f27e25112c40..a4c2d0b9cd51 100644 --- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c +++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c @@ -592,6 +592,7 @@ static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev) struct iommu_group *group; struct vfio_fsl_mc_device *vdev; struct device *dev = &mc_dev->dev; + struct vfio_device *device; int ret; group = vfio_iommu_group_get(dev); @@ -608,8 +609,9 @@ static int vfio_fsl_mc_probe(struct fsl_mc_device *mc_dev) vdev->mc_dev = mc_dev; - ret = vfio_add_group_dev(dev, &vfio_fsl_mc_ops, vdev); - if (ret) { + device = vfio_add_group_dev(dev, &vfio_fsl_mc_ops, vdev); + if (IS_ERR(device)) { + ret = PTR_ERR(device); dev_err(dev, "VFIO_FSL_MC: Failed to add to vfio group\n"); goto out_group_put; } diff --git a/drivers/vfio/mdev/vfio_mdev.c b/drivers/vfio/mdev/vfio_mdev.c index b52eea128549..32901b265864 100644 --- a/drivers/vfio/mdev/vfio_mdev.c +++ b/drivers/vfio/mdev/vfio_mdev.c @@ -124,8 +124,11 @@ static const struct vfio_device_ops vfio_mdev_dev_ops = { static int vfio_mdev_probe(struct device *dev) { struct mdev_device *mdev = to_mdev_device(dev); + struct vfio_device *device; - return vfio_add_group_dev(dev, &vfio_mdev_dev_ops, mdev); + device = vfio_add_group_dev(dev, &vfio_mdev_dev_ops, mdev); + + return IS_ERR(device) ? PTR_ERR(device) : 0; } static void vfio_mdev_remove(struct device *dev) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 65e7e6b44578..f0a1d05f0137 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1926,6 +1926,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct vfio_pci_device *vdev; struct iommu_group *group; + struct vfio_device *device; int ret; if (vfio_pci_is_denylisted(pdev)) @@ -1968,9 +1969,11 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) INIT_LIST_HEAD(&vdev->vma_list); init_rwsem(&vdev->memory_lock); - ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); - if (ret) + device = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); + if (IS_ERR(device)) { + ret = PTR_ERR(device); goto out_free; + } ret = vfio_pci_reflck_attach(vdev); if (ret) diff --git a/drivers/vfio/platform/vfio_platform_common.c b/drivers/vfio/platform/vfio_platform_common.c index fb4b385191f2..ff41fe0b758e 100644 --- a/drivers/vfio/platform/vfio_platform_common.c +++ b/drivers/vfio/platform/vfio_platform_common.c @@ -657,6 +657,7 @@ int vfio_platform_probe_common(struct vfio_platform_device *vdev, struct device *dev) { struct iommu_group *group; + struct vfio_device *device; int ret; if (!vdev) @@ -685,9 +686,11 @@ int vfio_platform_probe_common(struct vfio_platform_device *vdev, goto put_reset; } - ret = vfio_add_group_dev(dev, &vfio_platform_ops, vdev); - if (ret) + device = vfio_add_group_dev(dev, &vfio_platform_ops, vdev); + if (IS_ERR(device)) { + ret = PTR_ERR(device); goto put_iommu; + } mutex_init(&vdev->igate); diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 464caef97aff..067cd843961c 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -848,8 +848,9 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb, /** * VFIO driver API */ -int vfio_add_group_dev(struct device *dev, - const struct vfio_device_ops *ops, void *device_data) +struct vfio_device *vfio_add_group_dev(struct device *dev, + const struct vfio_device_ops *ops, + void *device_data) { struct iommu_group *iommu_group; struct vfio_group *group; @@ -857,14 +858,14 @@ int vfio_add_group_dev(struct device *dev, iommu_group = iommu_group_get(dev); if (!iommu_group) - return -EINVAL; + return ERR_PTR(-EINVAL); group = vfio_group_get_from_iommu(iommu_group); if (!group) { group = vfio_create_group(iommu_group); if (IS_ERR(group)) { iommu_group_put(iommu_group); - return PTR_ERR(group); + return (struct vfio_device *)group; } } else { /* @@ -880,23 +881,19 @@ int vfio_add_group_dev(struct device *dev, iommu_group_id(iommu_group)); vfio_device_put(device); vfio_group_put(group); - return -EBUSY; + return ERR_PTR(-EBUSY); } device = vfio_group_create_device(group, dev, ops, device_data); - if (IS_ERR(device)) { - vfio_group_put(group); - return PTR_ERR(device); - } /* - * Drop all but the vfio_device reference. The vfio_device holds - * a reference to the vfio_group, which holds a reference to the - * iommu_group. + * Drop all but the vfio_device reference. The vfio_device, if + * !IS_ERR() holds a reference to the vfio_group, which holds a + * reference to the iommu_group. */ vfio_group_put(group); - return 0; + return device; } EXPORT_SYMBOL_GPL(vfio_add_group_dev); diff --git a/include/linux/vfio.h b/include/linux/vfio.h index b7e18bde5aa8..b784463000d4 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -48,9 +48,9 @@ struct vfio_device_ops { extern struct iommu_group *vfio_iommu_group_get(struct device *dev); extern void vfio_iommu_group_put(struct iommu_group *group, struct device *dev); -extern int vfio_add_group_dev(struct device *dev, - const struct vfio_device_ops *ops, - void *device_data); +extern struct vfio_device *vfio_add_group_dev(struct device *dev, + const struct vfio_device_ops *ops, + void *device_data); extern void *vfio_del_group_dev(struct device *dev); extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); From patchwork Mon Feb 22 16:50:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12099177 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34216C433DB for ; Mon, 22 Feb 2021 16:53:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 02DF864F08 for ; Mon, 22 Feb 2021 16:53:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231443AbhBVQxD (ORCPT ); Mon, 22 Feb 2021 11:53:03 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:24498 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231340AbhBVQwg (ORCPT ); Mon, 22 Feb 2021 11:52:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614012671; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uzp0VHuXflaSzr4YP7UkIUcUa63Bg3Yc8ylwuaHiZ/Y=; b=NI+2jK+kjgkmiznMJStCkVintjQ2+PXLIzyAJoy8Cxz6t2LiHjaXcDLhMoRsJ8b8t6Cojo VBWsjtHO38eV3CpUg+HS8s77WuTWLOcs5gigY7R6wRAgcaw5TFXRa7XTPyu7IukVTJ+X3l +egqo/FQPdyveXW8hNb88+uzQJoIiaE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-586-vBJJ812AP6GE3mDmp3VvGg-1; Mon, 22 Feb 2021 11:51:09 -0500 X-MC-Unique: vBJJ812AP6GE3mDmp3VvGg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1835A192AB78; Mon, 22 Feb 2021 16:51:08 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id A92315C1BD; Mon, 22 Feb 2021 16:50:59 +0000 (UTC) Subject: [RFC PATCH 03/10] vfio: Export unmap_mapping_range() wrapper From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 22 Feb 2021 09:50:59 -0700 Message-ID: <161401265929.16443.14593298513137995113.stgit@gimli.home> In-Reply-To: <161401167013.16443.8389863523766611711.stgit@gimli.home> References: <161401167013.16443.8389863523766611711.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Allow bus drivers to use vfio pseudo fs mapping to zap all mmaps across a range of their device files. Signed-off-by: Alex Williamson --- drivers/vfio/vfio.c | 7 +++++++ include/linux/vfio.h | 2 ++ 2 files changed, 9 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 067cd843961c..da212425ab30 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -565,6 +565,13 @@ static struct inode *vfio_fs_inode_new(void) return inode; } +void vfio_device_unmap_mapping_range(struct vfio_device *device, + loff_t start, loff_t len) +{ + unmap_mapping_range(device->inode->i_mapping, start, len, true); +} +EXPORT_SYMBOL_GPL(vfio_device_unmap_mapping_range); + /** * Device objects - create, release, get, put, search */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index b784463000d4..f435dfca15eb 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -56,6 +56,8 @@ extern void *vfio_del_group_dev(struct device *dev); extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); extern void vfio_device_put(struct vfio_device *device); extern void *vfio_device_data(struct vfio_device *device); +extern void vfio_device_unmap_mapping_range(struct vfio_device *device, + loff_t start, loff_t len); /* events for the backend driver notify callback */ enum vfio_iommu_notify_type { From patchwork Mon Feb 22 16:51:13 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12099179 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3AB8C433E9 for ; Mon, 22 Feb 2021 16:53:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C76FC64F04 for ; Mon, 22 Feb 2021 16:53:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230296AbhBVQxO (ORCPT ); Mon, 22 Feb 2021 11:53:14 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:53233 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231383AbhBVQwx (ORCPT ); Mon, 22 Feb 2021 11:52:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614012686; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hGJavgT0L+9nTbA+YhXg61UES1neVWlm8Zct0CaRzQI=; b=JF9BIZO1m0nmY9AwHxXVAPLDAXeIFk/DeHuBIvOV4awqzYuF3sZ3aILB8SejhIK2q0j8IQ ftz+pBW4DvgbAuZDfu2j+RGgM4Afa6XqBvC1kLCO5KCMNyfY6i6vc52ZRaO1LezQ1iXY1/ 11gjfc1ZmcHF5VwPBDUO3jZccYMsIvc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-150-MT2kVTMdPjGigHtXCcbzcw-1; Mon, 22 Feb 2021 11:51:21 -0500 X-MC-Unique: MT2kVTMdPjGigHtXCcbzcw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 4A2D0801979; Mon, 22 Feb 2021 16:51:20 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id 873B95D9D0; Mon, 22 Feb 2021 16:51:13 +0000 (UTC) Subject: [RFC PATCH 04/10] vfio/pci: Use vfio_device_unmap_mapping_range() From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 22 Feb 2021 09:51:13 -0700 Message-ID: <161401267316.16443.11184767955094847849.stgit@gimli.home> In-Reply-To: <161401167013.16443.8389863523766611711.stgit@gimli.home> References: <161401167013.16443.8389863523766611711.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org With the vfio device fd tied to the address space of the pseudo fs inode, we can use the mm to track all vmas that might be mmap'ing device BARs, which removes our vma_list and all the complicated lock ordering necessary to manually zap each related vma. Suggested-by: Jason Gunthorpe Signed-off-by: Alex Williamson --- drivers/vfio/pci/vfio_pci.c | 217 ++++------------------------------- drivers/vfio/pci/vfio_pci_private.h | 3 2 files changed, 28 insertions(+), 192 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index f0a1d05f0137..115f10f7b096 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -225,7 +225,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev); static void vfio_pci_disable(struct vfio_pci_device *vdev); -static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data); +static int vfio_pci_mem_trylock_and_zap_cb(struct pci_dev *pdev, void *data); /* * INTx masking requires the ability to disable INTx signaling via PCI_COMMAND @@ -1168,7 +1168,7 @@ static long vfio_pci_ioctl(void *device_data, struct vfio_pci_group_info info; struct vfio_devices devs = { .cur_index = 0 }; bool slot = false; - int i, group_idx, mem_idx = 0, count = 0, ret = 0; + int i, group_idx, count = 0, ret = 0; minsz = offsetofend(struct vfio_pci_hot_reset, count); @@ -1268,32 +1268,16 @@ static long vfio_pci_ioctl(void *device_data, } /* - * We need to get memory_lock for each device, but devices - * can share mmap_lock, therefore we need to zap and hold - * the vma_lock for each device, and only then get each - * memory_lock. + * Try to get the memory_lock write lock for all devices and + * zap all BAR mmaps. */ ret = vfio_pci_for_each_slot_or_bus(vdev->pdev, - vfio_pci_try_zap_and_vma_lock_cb, + vfio_pci_mem_trylock_and_zap_cb, &devs, slot); - if (ret) - goto hot_reset_release; - - for (; mem_idx < devs.cur_index; mem_idx++) { - struct vfio_pci_device *tmp; - - tmp = vfio_device_data(devs.devices[mem_idx]); - - ret = down_write_trylock(&tmp->memory_lock); - if (!ret) { - ret = -EBUSY; - goto hot_reset_release; - } - mutex_unlock(&tmp->vma_lock); - } /* User has access, do the reset */ - ret = pci_reset_bus(vdev->pdev); + if (!ret) + ret = pci_reset_bus(vdev->pdev); hot_reset_release: for (i = 0; i < devs.cur_index; i++) { @@ -1303,10 +1287,7 @@ static long vfio_pci_ioctl(void *device_data, device = devs.devices[i]; tmp = vfio_device_data(device); - if (i < mem_idx) - up_write(&tmp->memory_lock); - else - mutex_unlock(&tmp->vma_lock); + up_write(&tmp->memory_lock); vfio_device_put(device); } kfree(devs.devices); @@ -1452,100 +1433,18 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf, return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true); } -/* Return 1 on zap and vma_lock acquired, 0 on contention (only with @try) */ -static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try) +static void vfio_pci_zap_bars(struct vfio_pci_device *vdev) { - struct vfio_pci_mmap_vma *mmap_vma, *tmp; - - /* - * Lock ordering: - * vma_lock is nested under mmap_lock for vm_ops callback paths. - * The memory_lock semaphore is used by both code paths calling - * into this function to zap vmas and the vm_ops.fault callback - * to protect the memory enable state of the device. - * - * When zapping vmas we need to maintain the mmap_lock => vma_lock - * ordering, which requires using vma_lock to walk vma_list to - * acquire an mm, then dropping vma_lock to get the mmap_lock and - * reacquiring vma_lock. This logic is derived from similar - * requirements in uverbs_user_mmap_disassociate(). - * - * mmap_lock must always be the top-level lock when it is taken. - * Therefore we can only hold the memory_lock write lock when - * vma_list is empty, as we'd need to take mmap_lock to clear - * entries. vma_list can only be guaranteed empty when holding - * vma_lock, thus memory_lock is nested under vma_lock. - * - * This enables the vm_ops.fault callback to acquire vma_lock, - * followed by memory_lock read lock, while already holding - * mmap_lock without risk of deadlock. - */ - while (1) { - struct mm_struct *mm = NULL; - - if (try) { - if (!mutex_trylock(&vdev->vma_lock)) - return 0; - } else { - mutex_lock(&vdev->vma_lock); - } - while (!list_empty(&vdev->vma_list)) { - mmap_vma = list_first_entry(&vdev->vma_list, - struct vfio_pci_mmap_vma, - vma_next); - mm = mmap_vma->vma->vm_mm; - if (mmget_not_zero(mm)) - break; - - list_del(&mmap_vma->vma_next); - kfree(mmap_vma); - mm = NULL; - } - if (!mm) - return 1; - mutex_unlock(&vdev->vma_lock); - - if (try) { - if (!mmap_read_trylock(mm)) { - mmput(mm); - return 0; - } - } else { - mmap_read_lock(mm); - } - if (try) { - if (!mutex_trylock(&vdev->vma_lock)) { - mmap_read_unlock(mm); - mmput(mm); - return 0; - } - } else { - mutex_lock(&vdev->vma_lock); - } - list_for_each_entry_safe(mmap_vma, tmp, - &vdev->vma_list, vma_next) { - struct vm_area_struct *vma = mmap_vma->vma; - - if (vma->vm_mm != mm) - continue; - - list_del(&mmap_vma->vma_next); - kfree(mmap_vma); - - zap_vma_ptes(vma, vma->vm_start, - vma->vm_end - vma->vm_start); - } - mutex_unlock(&vdev->vma_lock); - mmap_read_unlock(mm); - mmput(mm); - } + vfio_device_unmap_mapping_range(vdev->device, + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX), + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) - + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX)); } void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev) { - vfio_pci_zap_and_vma_lock(vdev, false); down_write(&vdev->memory_lock); - mutex_unlock(&vdev->vma_lock); + vfio_pci_zap_bars(vdev); } u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev) @@ -1567,82 +1466,25 @@ void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd) up_write(&vdev->memory_lock); } -/* Caller holds vma_lock */ -static int __vfio_pci_add_vma(struct vfio_pci_device *vdev, - struct vm_area_struct *vma) -{ - struct vfio_pci_mmap_vma *mmap_vma; - - mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL); - if (!mmap_vma) - return -ENOMEM; - - mmap_vma->vma = vma; - list_add(&mmap_vma->vma_next, &vdev->vma_list); - - return 0; -} - -/* - * Zap mmaps on open so that we can fault them in on access and therefore - * our vma_list only tracks mappings accessed since last zap. - */ -static void vfio_pci_mmap_open(struct vm_area_struct *vma) -{ - zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start); -} - -static void vfio_pci_mmap_close(struct vm_area_struct *vma) -{ - struct vfio_pci_device *vdev = vma->vm_private_data; - struct vfio_pci_mmap_vma *mmap_vma; - - mutex_lock(&vdev->vma_lock); - list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) { - if (mmap_vma->vma == vma) { - list_del(&mmap_vma->vma_next); - kfree(mmap_vma); - break; - } - } - mutex_unlock(&vdev->vma_lock); -} - static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct vfio_pci_device *vdev = vma->vm_private_data; - vm_fault_t ret = VM_FAULT_NOPAGE; + vm_fault_t ret = VM_FAULT_SIGBUS; - mutex_lock(&vdev->vma_lock); down_read(&vdev->memory_lock); - if (!__vfio_pci_memory_enabled(vdev)) { - ret = VM_FAULT_SIGBUS; - mutex_unlock(&vdev->vma_lock); - goto up_out; - } - - if (__vfio_pci_add_vma(vdev, vma)) { - ret = VM_FAULT_OOM; - mutex_unlock(&vdev->vma_lock); - goto up_out; - } - - mutex_unlock(&vdev->vma_lock); + if (__vfio_pci_memory_enabled(vdev) && + !io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, + vma->vm_end - vma->vm_start, vma->vm_page_prot)) + ret = VM_FAULT_NOPAGE; - if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, - vma->vm_end - vma->vm_start, vma->vm_page_prot)) - ret = VM_FAULT_SIGBUS; - -up_out: up_read(&vdev->memory_lock); + return ret; } static const struct vm_operations_struct vfio_pci_mmap_ops = { - .open = vfio_pci_mmap_open, - .close = vfio_pci_mmap_close, .fault = vfio_pci_mmap_fault, }; @@ -1926,7 +1768,6 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) { struct vfio_pci_device *vdev; struct iommu_group *group; - struct vfio_device *device; int ret; if (vfio_pci_is_denylisted(pdev)) @@ -1965,13 +1806,11 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) mutex_init(&vdev->ioeventfds_lock); INIT_LIST_HEAD(&vdev->dummy_resources_list); INIT_LIST_HEAD(&vdev->ioeventfds_list); - mutex_init(&vdev->vma_lock); - INIT_LIST_HEAD(&vdev->vma_list); init_rwsem(&vdev->memory_lock); - device = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); - if (IS_ERR(device)) { - ret = PTR_ERR(device); + vdev->device = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev); + if (IS_ERR(vdev->device)) { + ret = PTR_ERR(vdev->device); goto out_free; } @@ -2253,7 +2092,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data) return 0; } -static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) +static int vfio_pci_mem_trylock_and_zap_cb(struct pci_dev *pdev, void *data) { struct vfio_devices *devs = data; struct vfio_device *device; @@ -2273,15 +2112,13 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) vdev = vfio_device_data(device); - /* - * Locking multiple devices is prone to deadlock, runaway and - * unwind if we hit contention. - */ - if (!vfio_pci_zap_and_vma_lock(vdev, true)) { + if (!down_write_trylock(&vdev->memory_lock)) { vfio_device_put(device); return -EBUSY; } + vfio_pci_zap_bars(vdev); + devs->devices[devs->cur_index++] = device; return 0; } diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h index 9cd1882a05af..ba37f4eeefd0 100644 --- a/drivers/vfio/pci/vfio_pci_private.h +++ b/drivers/vfio/pci/vfio_pci_private.h @@ -101,6 +101,7 @@ struct vfio_pci_mmap_vma { struct vfio_pci_device { struct pci_dev *pdev; + struct vfio_device *device; void __iomem *barmap[PCI_STD_NUM_BARS]; bool bar_mmap_supported[PCI_STD_NUM_BARS]; u8 *pci_config_map; @@ -139,8 +140,6 @@ struct vfio_pci_device { struct list_head ioeventfds_list; struct vfio_pci_vf_token *vf_token; struct notifier_block nb; - struct mutex vma_lock; - struct list_head vma_list; struct rw_semaphore memory_lock; }; From patchwork Mon Feb 22 16:51:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12099181 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BF56C433DB for ; Mon, 22 Feb 2021 16:53:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2792864F00 for ; Mon, 22 Feb 2021 16:53:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231473AbhBVQxS (ORCPT ); Mon, 22 Feb 2021 11:53:18 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:49312 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231359AbhBVQxC (ORCPT ); Mon, 22 Feb 2021 11:53:02 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614012696; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cEeUgZ0gYEuJNqC4jH1I44GKM21P1WyDVnv51cT+tDk=; b=XWYILxXhRxGcryCeZUYWyIm3UfG0fGg+kf8KGx0Yv5mDV9mmNW3hO0jVAZwqXIujs499EW gUMDkuSAnIg0FJa9HTXFlIPJZP1K0PlcDeplfuOto2NJqwf7uxAuv1r15e8JLSoWQnUn38 IkLN8BWNOMhylBvnPmQpFOb28f7DhO0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-282-imn9u1fnMvekfwV_Lz63zQ-1; Mon, 22 Feb 2021 11:51:34 -0500 X-MC-Unique: imn9u1fnMvekfwV_Lz63zQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A575F192AB78; Mon, 22 Feb 2021 16:51:33 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id BBA8660C04; Mon, 22 Feb 2021 16:51:25 +0000 (UTC) Subject: [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 22 Feb 2021 09:51:25 -0700 Message-ID: <161401268537.16443.2329805617992345365.stgit@gimli.home> In-Reply-To: <161401167013.16443.8389863523766611711.stgit@gimli.home> References: <161401167013.16443.8389863523766611711.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Introduce a vfio bus driver policy where using the exported vfio_device_vma_open() as the vm_ops.open for a vma indicates vm_private_data is the struct vfio_device pointer associated to the vma. This allows a direct vma to device lookup. Operating on an active, open vma to the device, we should be able to directly increment the vfio_device reference. Implemented only for vfio-pci for now. Suggested-by: Jason Gunthorpe Signed-off-by: Alex Williamson --- drivers/vfio/pci/vfio_pci.c | 6 ++++-- drivers/vfio/vfio.c | 24 ++++++++++++++++++++++++ include/linux/vfio.h | 2 ++ 3 files changed, 30 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index 115f10f7b096..f9529bac6c97 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -1469,7 +1469,8 @@ void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd) static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; - struct vfio_pci_device *vdev = vma->vm_private_data; + struct vfio_device *device = vma->vm_private_data; + struct vfio_pci_device *vdev = vfio_device_data(device); vm_fault_t ret = VM_FAULT_SIGBUS; down_read(&vdev->memory_lock); @@ -1485,6 +1486,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf) } static const struct vm_operations_struct vfio_pci_mmap_ops = { + .open = vfio_device_vma_open, .fault = vfio_pci_mmap_fault, }; @@ -1542,7 +1544,7 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma) } } - vma->vm_private_data = vdev; + vma->vm_private_data = vdev->device; vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff; diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index da212425ab30..399c42b77fbb 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -572,6 +572,15 @@ void vfio_device_unmap_mapping_range(struct vfio_device *device, } EXPORT_SYMBOL_GPL(vfio_device_unmap_mapping_range); +/* + * A VFIO bus driver using this open callback will provide a + * struct vfio_device pointer in the vm_private_data field. + */ +void vfio_device_vma_open(struct vm_area_struct *vma) +{ +} +EXPORT_SYMBOL_GPL(vfio_device_vma_open); + /** * Device objects - create, release, get, put, search */ @@ -927,6 +936,21 @@ struct vfio_device *vfio_device_get_from_dev(struct device *dev) } EXPORT_SYMBOL_GPL(vfio_device_get_from_dev); +struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma) +{ + struct vfio_device *device; + + if (vma->vm_ops->open != vfio_device_vma_open) + return ERR_PTR(-ENODEV); + + device = vma->vm_private_data; + + vfio_device_get(device); + + return device; +} +EXPORT_SYMBOL_GPL(vfio_device_get_from_vma); + static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group, char *buf) { diff --git a/include/linux/vfio.h b/include/linux/vfio.h index f435dfca15eb..188c2f3feed9 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -58,6 +58,8 @@ extern void vfio_device_put(struct vfio_device *device); extern void *vfio_device_data(struct vfio_device *device); extern void vfio_device_unmap_mapping_range(struct vfio_device *device, loff_t start, loff_t len); +extern void vfio_device_vma_open(struct vm_area_struct *vma); +extern struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma); /* events for the backend driver notify callback */ enum vfio_iommu_notify_type { From patchwork Mon Feb 22 16:51:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12099183 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7757DC433E0 for ; Mon, 22 Feb 2021 16:54:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3C48F64EF5 for ; Mon, 22 Feb 2021 16:54:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231499AbhBVQxv (ORCPT ); Mon, 22 Feb 2021 11:53:51 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:41485 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231478AbhBVQxU (ORCPT ); Mon, 22 Feb 2021 11:53:20 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614012714; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ScOwtZfsBfs3LX8QPLUz2J4hjmFvF5QjW3WOO14yaj8=; b=WEYIcp/rEeAeiIoAPdolDIaF9wgV/NwD0/Nku/k21ZlTBV7G/xg7ZGC8RPBCHBZiZNBnUv yC6xXV16wM7fg0EdgzIoITmwnnM9Bn/pUjUo8WEt1XvmiYrSGiUkICx3NyJ7qz2TRa2GvS U56RhlWTXCl7P//zfNFYdpJ7P5blw1A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-360-fHKP-MOyNQOfDhHfKM2YeQ-1; Mon, 22 Feb 2021 11:51:51 -0500 X-MC-Unique: fHKP-MOyNQOfDhHfKM2YeQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 35A00CC623; Mon, 22 Feb 2021 16:51:50 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id 22A5257; Mon, 22 Feb 2021 16:51:39 +0000 (UTC) Subject: [RFC PATCH 06/10] vfio: Add a device notifier interface From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 22 Feb 2021 09:51:38 -0700 Message-ID: <161401269874.16443.4238313694176658818.stgit@gimli.home> In-Reply-To: <161401167013.16443.8389863523766611711.stgit@gimli.home> References: <161401167013.16443.8389863523766611711.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Using a vfio device, a notifier block can be registered to receive select device events. Notifiers can only be registered for contained devices, ie. they are available through a user context. Registration of a notifier increments the reference to that container context therefore notifiers must minimally respond to the release event by asynchronously removing notifiers. Signed-off-by: Alex Williamson --- drivers/vfio/Kconfig | 1 + drivers/vfio/vfio.c | 35 +++++++++++++++++++++++++++++++++++ include/linux/vfio.h | 9 +++++++++ 3 files changed, 45 insertions(+) diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index 5533df91b257..8ac1601b681b 100644 --- a/drivers/vfio/Kconfig +++ b/drivers/vfio/Kconfig @@ -23,6 +23,7 @@ menuconfig VFIO tristate "VFIO Non-Privileged userspace driver framework" depends on IOMMU_API select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64) + select SRCU help VFIO provides a framework for secure userspace device drivers. See Documentation/driver-api/vfio.rst for more details. diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 399c42b77fbb..1a1b46215ac4 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -105,6 +105,7 @@ struct vfio_device { struct list_head group_next; void *device_data; struct inode *inode; + struct srcu_notifier_head notifier; }; #ifdef CONFIG_VFIO_NOIOMMU @@ -610,6 +611,7 @@ struct vfio_device *vfio_group_create_device(struct vfio_group *group, device->ops = ops; device->device_data = device_data; dev_set_drvdata(dev, device); + srcu_init_notifier_head(&device->notifier); /* No need to get group_lock, caller has group reference */ vfio_group_get(group); @@ -1778,6 +1780,39 @@ static const struct file_operations vfio_device_fops = { .mmap = vfio_device_fops_mmap, }; +int vfio_device_register_notifier(struct vfio_device *device, + struct notifier_block *nb) +{ + int ret; + + /* Container ref persists until unregister on success */ + ret = vfio_group_add_container_user(device->group); + if (ret) + return ret; + + ret = srcu_notifier_chain_register(&device->notifier, nb); + if (ret) + vfio_group_try_dissolve_container(device->group); + + return ret; +} +EXPORT_SYMBOL_GPL(vfio_device_register_notifier); + +void vfio_device_unregister_notifier(struct vfio_device *device, + struct notifier_block *nb) +{ + if (!srcu_notifier_chain_unregister(&device->notifier, nb)) + vfio_group_try_dissolve_container(device->group); +} +EXPORT_SYMBOL_GPL(vfio_device_unregister_notifier); + +int vfio_device_notifier_call(struct vfio_device *device, + enum vfio_device_notify_type event) +{ + return srcu_notifier_call_chain(&device->notifier, event, NULL); +} +EXPORT_SYMBOL_GPL(vfio_device_notifier_call); + /** * External user API, exported by symbols to be linked dynamically. * diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 188c2f3feed9..8217cd4ea53d 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -60,6 +60,15 @@ extern void vfio_device_unmap_mapping_range(struct vfio_device *device, loff_t start, loff_t len); extern void vfio_device_vma_open(struct vm_area_struct *vma); extern struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma); +extern int vfio_device_register_notifier(struct vfio_device *device, + struct notifier_block *nb); +extern void vfio_device_unregister_notifier(struct vfio_device *device, + struct notifier_block *nb); +enum vfio_device_notify_type { + VFIO_DEVICE_RELEASE = 0, +}; +int vfio_device_notifier_call(struct vfio_device *device, + enum vfio_device_notify_type event); /* events for the backend driver notify callback */ enum vfio_iommu_notify_type { From patchwork Mon Feb 22 16:51:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12099185 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 359D4C433E6 for ; Mon, 22 Feb 2021 16:54:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F196D64F04 for ; Mon, 22 Feb 2021 16:54:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231653AbhBVQyG (ORCPT ); Mon, 22 Feb 2021 11:54:06 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:49711 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231538AbhBVQxl (ORCPT ); Mon, 22 Feb 2021 11:53:41 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614012727; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/dqQuYtgn2Ymbgyr47y6jbgpp08/OqPh94j5D5czhkA=; b=WNfPOfxKqKGI7tMeBqoWGr1BsxsBMV3IF6WN3XTqHjJIe+etqeo7ve9xxERxGO42NMfq9E JoqO2HODYdCV3gmXRkmKjiOS6EHytK+KXbUF0wZLvJjZegNE5X93597Fp0oelLNGRLnxMm /xrzYK8VgC3UuWjWmh/VOe0xwS6nuWA= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-444-g3BCceQVOsu6X5IdmD4pgQ-1; Mon, 22 Feb 2021 11:52:03 -0500 X-MC-Unique: g3BCceQVOsu6X5IdmD4pgQ-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 81581192CC50; Mon, 22 Feb 2021 16:52:02 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id A5B355D6B1; Mon, 22 Feb 2021 16:51:55 +0000 (UTC) Subject: [RFC PATCH 07/10] vfio/pci: Notify on device release From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 22 Feb 2021 09:51:55 -0700 Message-ID: <161401271528.16443.2318400142031983698.stgit@gimli.home> In-Reply-To: <161401167013.16443.8389863523766611711.stgit@gimli.home> References: <161401167013.16443.8389863523766611711.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Trigger a release notifier call when open reference count is zero. Signed-off-by: Alex Williamson --- drivers/vfio/pci/vfio_pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index f9529bac6c97..fb8307430e24 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -560,6 +560,7 @@ static void vfio_pci_release(void *device_data) mutex_lock(&vdev->reflck->lock); if (!(--vdev->refcnt)) { + vfio_device_notifier_call(vdev->device, VFIO_DEVICE_RELEASE); vfio_pci_vf_token_user_add(vdev, -1); vfio_spapr_pci_eeh_release(vdev->pdev); vfio_pci_disable(vdev); From patchwork Mon Feb 22 16:52:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12099187 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F1C8C433E0 for ; Mon, 22 Feb 2021 16:54:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5DA9664F00 for ; Mon, 22 Feb 2021 16:54:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230223AbhBVQyR (ORCPT ); Mon, 22 Feb 2021 11:54:17 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:31292 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231589AbhBVQxp (ORCPT ); Mon, 22 Feb 2021 11:53:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614012738; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=U+ZP4IysQRoO0JLElNVXWcMQeJKBgRYaLkjQsWChkbU=; b=Hk2xOo3SPLlIIJB5iwVVyri0dvouSSfVdLRR23S7+L3262IqKtjYELFzNqPLCUCT5Lekpz 75EsrKIaAcAqWOC/k7JtQe2c/UbHEm9a1SlHBITjI0mItsTk5tvGXYLwQjZhu9HyA7SaWA qEE7nq+pClFLPmEHIfrj0cXyGQQ7ykM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-587-NmjuhxvSMzKV0q7wYt_OLQ-1; Mon, 22 Feb 2021 11:52:16 -0500 X-MC-Unique: NmjuhxvSMzKV0q7wYt_OLQ-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1ECAF80197A; Mon, 22 Feb 2021 16:52:15 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id F153C5D9D0; Mon, 22 Feb 2021 16:52:07 +0000 (UTC) Subject: [RFC PATCH 08/10] vfio/type1: Refactor pfn_list clearing From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 22 Feb 2021 09:52:07 -0700 Message-ID: <161401272758.16443.12511933342363753977.stgit@gimli.home> In-Reply-To: <161401167013.16443.8389863523766611711.stgit@gimli.home> References: <161401167013.16443.8389863523766611711.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Pull code out to a function for re-use. Signed-off-by: Alex Williamson --- drivers/vfio/vfio_iommu_type1.c | 57 +++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 23 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index b3df383d7028..5099b3c9dce0 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -484,6 +484,39 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm, return ret; } +/* Return 1 if iommu->lock dropped and notified, 0 if done */ +static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma, + struct vfio_dma **dma_last, int *retries) +{ + if (!RB_EMPTY_ROOT(&dma->pfn_list)) { + struct vfio_iommu_type1_dma_unmap nb_unmap; + + if (*dma_last == dma) { + BUG_ON(++(*retries) > 10); + } else { + *dma_last = dma; + *retries = 0; + } + + nb_unmap.iova = dma->iova; + nb_unmap.size = dma->size; + + /* + * Notify anyone (mdev vendor drivers) to invalidate and + * unmap iovas within the range we're about to unmap. + * Vendor drivers MUST unpin pages in response to an + * invalidation. + */ + mutex_unlock(&iommu->lock); + blocking_notifier_call_chain(&iommu->notifier, + VFIO_IOMMU_NOTIFY_DMA_UNMAP, + &nb_unmap); + return 1; + } + + return 0; +} + static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, int prot, unsigned long *pfn) { @@ -1307,29 +1340,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, continue; } - if (!RB_EMPTY_ROOT(&dma->pfn_list)) { - struct vfio_iommu_type1_dma_unmap nb_unmap; - - if (dma_last == dma) { - BUG_ON(++retries > 10); - } else { - dma_last = dma; - retries = 0; - } - - nb_unmap.iova = dma->iova; - nb_unmap.size = dma->size; - - /* - * Notify anyone (mdev vendor drivers) to invalidate and - * unmap iovas within the range we're about to unmap. - * Vendor drivers MUST unpin pages in response to an - * invalidation. - */ - mutex_unlock(&iommu->lock); - blocking_notifier_call_chain(&iommu->notifier, - VFIO_IOMMU_NOTIFY_DMA_UNMAP, - &nb_unmap); + if (unmap_dma_pfn_list(iommu, dma, &dma_last, &retries)) { mutex_lock(&iommu->lock); goto again; } From patchwork Mon Feb 22 16:52:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12099189 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13A27C433E6 for ; Mon, 22 Feb 2021 16:55:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DC51964F00 for ; Mon, 22 Feb 2021 16:55:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230460AbhBVQyi (ORCPT ); Mon, 22 Feb 2021 11:54:38 -0500 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:24678 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231636AbhBVQx5 (ORCPT ); Mon, 22 Feb 2021 11:53:57 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614012750; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FrHO8JCB3HL7LGX42UvL1nk10weStu84JHRB4yOtxDY=; b=Z9Tw48RojmXql6EaYDvmPjsjvqNRj+akQqjPtChalmk/n++BE0AxDHDLUcM0hIvfx3kcX6 dSHmmViRgJ38adz1mNWMiz7vfIVeiTkEy35UTNOsE6KIZ1LyaA7fAaAYcxx5RSxXYBzJNS IktimORKoHw0TTnJPinj4YnOynel4QI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-527-UUHXrAgAMcGGC1bHaDV4kg-1; Mon, 22 Feb 2021 11:52:28 -0500 X-MC-Unique: UUHXrAgAMcGGC1bHaDV4kg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id B045E1934103; Mon, 22 Feb 2021 16:52:27 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id 921E95C1BD; Mon, 22 Feb 2021 16:52:20 +0000 (UTC) Subject: [RFC PATCH 09/10] vfio/type1: Pass iommu and dma objects through to vaddr_get_pfn From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 22 Feb 2021 09:52:20 -0700 Message-ID: <161401274019.16443.9688799433041636464.stgit@gimli.home> In-Reply-To: <161401167013.16443.8389863523766611711.stgit@gimli.home> References: <161401167013.16443.8389863523766611711.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org We'll need these to track vfio device mappings. Signed-off-by: Alex Williamson --- drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 5099b3c9dce0..b34ee4b96a4a 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -517,15 +517,16 @@ static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma, return 0; } -static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, - int prot, unsigned long *pfn) +static int vaddr_get_pfn(struct vfio_iommu *iommu, struct vfio_dma *dma, + struct mm_struct *mm, unsigned long vaddr, + unsigned long *pfn) { struct page *page[1]; struct vm_area_struct *vma; unsigned int flags = 0; int ret; - if (prot & IOMMU_WRITE) + if (dma->prot & IOMMU_WRITE) flags |= FOLL_WRITE; mmap_read_lock(mm); @@ -543,7 +544,8 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, vma = find_vma_intersection(mm, vaddr, vaddr + 1); if (vma && vma->vm_flags & VM_PFNMAP) { - ret = follow_fault_pfn(vma, mm, vaddr, pfn, prot & IOMMU_WRITE); + ret = follow_fault_pfn(vma, mm, vaddr, pfn, + dma->prot & IOMMU_WRITE); if (ret == -EAGAIN) goto retry; @@ -615,7 +617,8 @@ static int vfio_wait_all_valid(struct vfio_iommu *iommu) * the iommu can only map chunks of consecutive pfns anyway, so get the * first page and all consecutive pages with the same locking. */ -static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, +static long vfio_pin_pages_remote(struct vfio_iommu *iommu, + struct vfio_dma *dma, unsigned long vaddr, long npage, unsigned long *pfn_base, unsigned long limit) { @@ -628,7 +631,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, if (!current->mm) return -ENODEV; - ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, pfn_base); + ret = vaddr_get_pfn(iommu, dma, current->mm, vaddr, pfn_base); if (ret) return ret; @@ -655,7 +658,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr, /* Lock all the consecutive pages from pfn_base */ for (vaddr += PAGE_SIZE, iova += PAGE_SIZE; pinned < npage; pinned++, vaddr += PAGE_SIZE, iova += PAGE_SIZE) { - ret = vaddr_get_pfn(current->mm, vaddr, dma->prot, &pfn); + ret = vaddr_get_pfn(iommu, dma, current->mm, vaddr, &pfn); if (ret) break; @@ -715,7 +718,8 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova, return unlocked; } -static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, +static int vfio_pin_page_external(struct vfio_iommu *iommu, + struct vfio_dma *dma, unsigned long vaddr, unsigned long *pfn_base, bool do_accounting) { struct mm_struct *mm; @@ -725,7 +729,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr, if (!mm) return -ENODEV; - ret = vaddr_get_pfn(mm, vaddr, dma->prot, pfn_base); + ret = vaddr_get_pfn(iommu, dma, mm, vaddr, pfn_base); if (!ret && do_accounting && !is_invalid_reserved_pfn(*pfn_base)) { ret = vfio_lock_acct(dma, 1, true); if (ret) { @@ -833,8 +837,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data, } remote_vaddr = dma->vaddr + (iova - dma->iova); - ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i], - do_accounting); + ret = vfio_pin_page_external(iommu, dma, remote_vaddr, + &phys_pfn[i], do_accounting); if (ret) goto pin_unwind; @@ -1404,7 +1408,7 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, while (size) { /* Pin a contiguous chunk of memory */ - npage = vfio_pin_pages_remote(dma, vaddr + dma->size, + npage = vfio_pin_pages_remote(iommu, dma, vaddr + dma->size, size >> PAGE_SHIFT, &pfn, limit); if (npage <= 0) { WARN_ON(!npage); @@ -1660,7 +1664,7 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu, size_t n = dma->iova + dma->size - iova; long npage; - npage = vfio_pin_pages_remote(dma, vaddr, + npage = vfio_pin_pages_remote(iommu, dma, vaddr, n >> PAGE_SHIFT, &pfn, limit); if (npage <= 0) { From patchwork Mon Feb 22 16:52:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alex Williamson X-Patchwork-Id: 12099191 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5CBDEC43381 for ; Mon, 22 Feb 2021 16:55:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2E1C7600CF for ; Mon, 22 Feb 2021 16:55:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231635AbhBVQzK (ORCPT ); Mon, 22 Feb 2021 11:55:10 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:38096 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230008AbhBVQyL (ORCPT ); Mon, 22 Feb 2021 11:54:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614012762; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CMKYi977B1ceJjf8ogDnhhIiFZDTPthFKkUwzRtjKSk=; b=N2B0X0acUB8iAfbl5YpfADfJIkCVIYChY53LTbmo3c9aEug16ANWwtaiVq6NST+w79gaXI dfUHjo+H769KUcCkg/P7y+gObr+mEPxX3/gZvG+No6AHhpW8nqQqTDJFuG8GRWNtKlR1KC 04Fjy4YbTAsAuUC8nOnfwu5mOBNjWiY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-6-8r2AjvF6N0S4Z3izvkOrgA-1; Mon, 22 Feb 2021 11:52:40 -0500 X-MC-Unique: 8r2AjvF6N0S4Z3izvkOrgA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9B4A6100CCC0; Mon, 22 Feb 2021 16:52:39 +0000 (UTC) Received: from gimli.home (ovpn-112-255.phx2.redhat.com [10.3.112.255]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2BA0C5D6B1; Mon, 22 Feb 2021 16:52:33 +0000 (UTC) Subject: [RFC PATCH 10/10] vfio/type1: Register device notifier From: Alex Williamson To: alex.williamson@redhat.com Cc: cohuck@redhat.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, jgg@nvidia.com, peterx@redhat.com Date: Mon, 22 Feb 2021 09:52:32 -0700 Message-ID: <161401275279.16443.6350471385325897377.stgit@gimli.home> In-Reply-To: <161401167013.16443.8389863523766611711.stgit@gimli.home> References: <161401167013.16443.8389863523766611711.stgit@gimli.home> User-Agent: StGit/0.21-dirty MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Introduce a new default strict MMIO mapping mode where the vma for a VM_PFNMAP mapping must be backed by a vfio device. This allows holding a reference to the device and registering a notifier for the device, which additionally keeps the device in an IOMMU context for the extent of the DMA mapping. On notification of device release, automatically drop the DMA mappings for it. Signed-off-by: Alex Williamson --- drivers/vfio/vfio_iommu_type1.c | 124 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 123 insertions(+), 1 deletion(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index b34ee4b96a4a..2a16257bd5b6 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -61,6 +61,11 @@ module_param_named(dma_entry_limit, dma_entry_limit, uint, 0644); MODULE_PARM_DESC(dma_entry_limit, "Maximum number of user DMA mappings per container (65535)."); +static bool strict_mmio_maps = true; +module_param_named(strict_mmio_maps, strict_mmio_maps, bool, 0644); +MODULE_PARM_DESC(strict_mmio_maps, + "Restrict to safe DMA mappings of device memory (true)."); + struct vfio_iommu { struct list_head domain_list; struct list_head iova_list; @@ -88,6 +93,14 @@ struct vfio_domain { bool fgsp; /* Fine-grained super pages */ }; +/* Req separate object for async removal from notifier vs dropping vfio_dma */ +struct pfnmap_obj { + struct notifier_block nb; + struct work_struct work; + struct vfio_iommu *iommu; + struct vfio_device *device; +}; + struct vfio_dma { struct rb_node node; dma_addr_t iova; /* Device address */ @@ -100,6 +113,7 @@ struct vfio_dma { struct task_struct *task; struct rb_root pfn_list; /* Ex-user pinned pfn list */ unsigned long *bitmap; + struct pfnmap_obj *pfnmap; }; struct vfio_group { @@ -517,6 +531,68 @@ static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma, return 0; } +static void unregister_device_bg(struct work_struct *work) +{ + struct pfnmap_obj *pfnmap = container_of(work, struct pfnmap_obj, work); + + vfio_device_unregister_notifier(pfnmap->device, &pfnmap->nb); + vfio_device_put(pfnmap->device); + kfree(pfnmap); +} + +/* + * pfnmap object can exist beyond the dma mapping referencing it, but it holds + * a container reference assuring the iommu exists. Find the dma, if exists. + */ +struct vfio_dma *pfnmap_find_dma(struct pfnmap_obj *pfnmap) +{ + struct rb_node *n; + + for (n = rb_first(&pfnmap->iommu->dma_list); n; n = rb_next(n)) { + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node); + + if (dma->pfnmap == pfnmap) + return dma; + } + + return NULL; +} + +static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma); + +static int vfio_device_nb_cb(struct notifier_block *nb, + unsigned long action, void *unused) +{ + struct pfnmap_obj *pfnmap = container_of(nb, struct pfnmap_obj, nb); + + switch (action) { + case VFIO_DEVICE_RELEASE: + { + struct vfio_dma *dma, *dma_last = NULL; + int retries = 0; +again: + mutex_lock(&pfnmap->iommu->lock); + dma = pfnmap_find_dma(pfnmap); + if (dma) { + if (unmap_dma_pfn_list(pfnmap->iommu, dma, + &dma_last, &retries)) + goto again; + + dma->pfnmap = NULL; + vfio_remove_dma(pfnmap->iommu, dma); + } + mutex_unlock(&pfnmap->iommu->lock); + + /* Cannot unregister notifier from callback chain */ + INIT_WORK(&pfnmap->work, unregister_device_bg); + schedule_work(&pfnmap->work); + break; + } + } + + return NOTIFY_OK; +} + static int vaddr_get_pfn(struct vfio_iommu *iommu, struct vfio_dma *dma, struct mm_struct *mm, unsigned long vaddr, unsigned long *pfn) @@ -549,8 +625,48 @@ static int vaddr_get_pfn(struct vfio_iommu *iommu, struct vfio_dma *dma, if (ret == -EAGAIN) goto retry; - if (!ret && !is_invalid_reserved_pfn(*pfn)) + if (!ret && !is_invalid_reserved_pfn(*pfn)) { ret = -EFAULT; + goto done; + } + + if (!dma->pfnmap) { + struct pfnmap_obj *pfnmap; + struct vfio_device *device; + + pfnmap = kzalloc(sizeof(*pfnmap), GFP_KERNEL); + if (!pfnmap) { + ret = -ENOMEM; + goto done; + } + + pfnmap->iommu = iommu; + pfnmap->nb.notifier_call = vfio_device_nb_cb; + + device = vfio_device_get_from_vma(vma); + if (IS_ERR(device)) { + kfree(pfnmap); + if (strict_mmio_maps) + ret = PTR_ERR(device); + + goto done; + } + + pfnmap->device = device; + ret = vfio_device_register_notifier(device, + &pfnmap->nb); + if (ret) { + vfio_device_put(device); + kfree(pfnmap); + if (!strict_mmio_maps) + ret = 0; + + goto done; + } + + dma->pfnmap = pfnmap; + } + } done: mmap_read_unlock(mm); @@ -1097,6 +1213,12 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) { WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list)); + if (dma->pfnmap) { + vfio_device_unregister_notifier(dma->pfnmap->device, + &dma->pfnmap->nb); + vfio_device_put(dma->pfnmap->device); + kfree(dma->pfnmap); + } vfio_unmap_unpin(iommu, dma, true); vfio_unlink_dma(iommu, dma); put_task_struct(dma->task);