From patchwork Mon Dec 16 09:59:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Lin Guay X-Patchwork-Id: 13909492 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 442D2202C47 for ; Mon, 16 Dec 2024 09:59:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734343195; cv=none; b=aiR+OmcM+eFgYP7XP/sZZnSQTPpyh4gPh0kF/N+hKP4I9bHHcOwTGRtrk+TgON83LDDq+gb7rwT8nm3sHP6HBRkAHvDtwkNbHvweYUo5n5ZPT7va4oaLRTJ8wiXItMYmQ76Vf6EpiZXAwqZNsC+4s45Gq5H/cAHjUwZjK76xpos= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734343195; c=relaxed/simple; bh=Kz2Cz4xXyqqOVI8c2RbvbXyNjFWqxi9UxkpAys98eIY=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=S+jKCv74gNuH+p/vRRQ07v2IAPtXtctrCdnwrM/9wfUU/XTz7WxV31HcfA7QQZJmf0/wZRuuR8iDipLyywrfHtxaPrkh4TqfOW6LWIB4XCNTVyENeYkRDms+kQD7M5V+FrQuY23uZ8ovL39oQX7l8PJCQTDymUM+QWhjUFgAVjY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b=Unc6/rem; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="Unc6/rem" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4BG6qAF9020773 for ; Mon, 16 Dec 2024 01:59:53 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=facebook; bh=f 0d7e7vgSwFquFJynHv37WoE8Ih8RaPc8Rz59bR6mwY=; b=Unc6/remm63njyGu+ jvXAH/eC6QlxeDqvT4mzSWtw5al3moX2Llllk0Aovi9s7cewco6JGQgL4GQNu7qs 0B7JdPrcB6JWWhGbQT/ZqcuRx4OU4SuUOWsraHjPpJ19aN3QlCxGbgB7o9EWD+NA w2DmC2vtNGVJ5R8JurvBo7I2nE= Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 43jfa28u1a-11 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 16 Dec 2024 01:59:53 -0800 (PST) Received: from twshared40462.17.frc2.facebook.com (2620:10d:c0a8:1b::30) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Mon, 16 Dec 2024 09:59:39 +0000 Received: by devvm12370.nha0.facebook.com (Postfix, from userid 624418) id C682310A1F701; Mon, 16 Dec 2024 01:59:27 -0800 (PST) From: Wei Lin Guay To: , , , CC: , , , , , Wei Lin Guay Subject: [PATCH 1/4] vfio: Add vfio_device_get() Date: Mon, 16 Dec 2024 01:59:15 -0800 Message-ID: <20241216095920.237117-2-wguay@fb.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241216095920.237117-1-wguay@fb.com> References: <20241216095920.237117-1-wguay@fb.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: H2dMckEVUj5XI1_6bbJ0hVGgO7VAoQZ1 X-Proofpoint-ORIG-GUID: H2dMckEVUj5XI1_6bbJ0hVGgO7VAoQZ1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Jason Gunthorpe Summary: To increment a reference the caller already holds. Export vfio_device_put() to pair with it. Signed-off-by: Jason Gunthorpe Signed-off-by: Wei Lin Guay Reviewed-by: Dag Moxnes Reviewed-by: Keith Busch Reviewed-by: Nic Viljoen --- drivers/vfio/vfio_main.c | 1 + include/linux/vfio.h | 6 ++++++ 2 files changed, 7 insertions(+) -- 2.43.5 diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index a5a62d9d963f..7e318e15abd5 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -171,6 +171,7 @@ void vfio_device_put_registration(struct vfio_device *device) if (refcount_dec_and_test(&device->refcount)) complete(&device->comp); } +EXPORT_SYMBOL_GPL(vfio_device_put_registration); bool vfio_device_try_get_registration(struct vfio_device *device) { diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 000a6cab2d31..d7c790be4bbc 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -279,6 +279,12 @@ static inline void vfio_put_device(struct vfio_device *device) int vfio_register_group_dev(struct vfio_device *device); int vfio_register_emulated_iommu_dev(struct vfio_device *device); void vfio_unregister_group_dev(struct vfio_device *device); +void vfio_device_put_registration(struct vfio_device *device); + +static inline void vfio_device_get(struct vfio_device *device) +{ + refcount_inc(&device->refcount); +} int vfio_assign_device_set(struct vfio_device *device, void *set_id); unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set); From patchwork Mon Dec 16 09:59:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Lin Guay X-Patchwork-Id: 13909490 Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4DDB9202C23 for ; Mon, 16 Dec 2024 09:59:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.145.42 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734343190; cv=none; b=G+JjXy6HEonPmd8wn2WO/V0+TvBy3pSu6baM0Eplk2NN8NY8VZkCLyoSMbBtL1XHTh9VYP3RTYIObViKuaxq3BZAlTbNNqKedNzkJJzlOiSxZwMncBXDXGyzOA8ZLBW8Cb24kq5LSOr21gBXIFTDmTqJIJYK4NOXL9Ykcddv8ew= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734343190; c=relaxed/simple; bh=5P1uqXyO4EerMfJMg5b6+ZEycfMz4MOJsVkFcXjJvCw=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=eNTG4yVFNbHZgj7lucisn5gNix40CEFmyZeZPmnx2zdQzSbZLoaGXmYs8RZkgTC/xyT3j+ptgUSx8pzZOESCdvkqiESKcodBlWppifLPSmWm3P3Neu16qDmdwkXw4iepISVbd4koC42uRmgnZMjBte1bi2agSIGvONmqSEqB0J8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b=iQcW2qtH; arc=none smtp.client-ip=67.231.145.42 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="iQcW2qtH" Received: from pps.filterd (m0044010.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4BG7vNCN000407 for ; Mon, 16 Dec 2024 01:59:48 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=facebook; bh=X F5WtkBONmd+8r9yLmklM/sXmsTbxQzH7Qh2X2Sk4Zo=; b=iQcW2qtHMLJ/fH7Dn QLQlIMCfpgE3LFX5gdTeBwzUmcVUKiLQ/Z2PFQo6W+iiA+lnNuW6HAoK9dxOhxrV VIrl24J9lGGEBL84/L24xI1lCL6jp86mYmZIGfGqLz+v+bQ6k2sxkl8E6pCPysHC npQrXcJIcsQZGXQHvwVPWpAIwk= Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 43jg8pgje7-5 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 16 Dec 2024 01:59:48 -0800 (PST) Received: from twshared18153.09.ash9.facebook.com (2620:10d:c085:108::150d) by mail.thefacebook.com (2620:10d:c08b:78::2ac9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Mon, 16 Dec 2024 09:59:34 +0000 Received: by devvm12370.nha0.facebook.com (Postfix, from userid 624418) id 93F1210A1F70A; Mon, 16 Dec 2024 01:59:28 -0800 (PST) From: Wei Lin Guay To: , , , CC: , , , , , Wei Lin Guay Subject: [PATCH 2/4] dma-buf: Add dma_buf_try_get() Date: Mon, 16 Dec 2024 01:59:16 -0800 Message-ID: <20241216095920.237117-3-wguay@fb.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241216095920.237117-1-wguay@fb.com> References: <20241216095920.237117-1-wguay@fb.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: ALjAaC6HwjIScAu12hAT1MCetKa9iq41 X-Proofpoint-ORIG-GUID: ALjAaC6HwjIScAu12hAT1MCetKa9iq41 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Jason Gunthorpe Summary: Used to increment the refcount of the dma buf's struct file, only if the refcount is not zero. Useful to allow the struct file's lifetime to control the lifetime of the dmabuf while still letting the driver to keep track of created dmabufs. Signed-off-by: Jason Gunthorpe Signed-off-by: Wei Lin Guay Reviewed-by: Dag Moxnes Reviewed-by: Keith Busch Reviewed-by: Nic Viljoen --- include/linux/dma-buf.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) -- 2.43.5 diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 36216d28d8bd..9854578afecd 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -614,6 +614,19 @@ int dma_buf_fd(struct dma_buf *dmabuf, int flags); struct dma_buf *dma_buf_get(int fd); void dma_buf_put(struct dma_buf *dmabuf); +/** + * dma_buf_try_get - try to get a reference on a dmabuf + * @dmabuf - the dmabuf to get + * + * Returns true if a reference was successfully obtained. The caller must + * interlock with the dmabuf's release function in some way, such as RCU, to + * ensure that this is not called on freed memory. + */ +static inline bool dma_buf_try_get(struct dma_buf *dmabuf) +{ + return get_file_rcu(&dmabuf->file); +} + struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *, enum dma_data_direction); void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *, From patchwork Mon Dec 16 09:59:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Lin Guay X-Patchwork-Id: 13909491 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C661F2036E4 for ; Mon, 16 Dec 2024 09:59:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734343194; cv=none; b=p8orjI+ATXFYVtzhUA+Xp87l3Ioz3xEk3hzti7ZX87yOrwiZ3Rs8dBdCM4vCEeYLkQOdFXDs1iVaQEYk0zZHo64cM1KiRhOjDG6TjkNHfzOuvlmXt3EYUlAC0L2lpWrzHu3Md6e1Fd+Sp+1dopRIB0cRxf8P0uLJi+KBk8lQJ6Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734343194; c=relaxed/simple; bh=fUaCjHyzxhPFkefjjL4a+lbkIYs8HZBpUH6QQzieUYg=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=tmUMgnpKCGAnH2pYfUc64rZSQYrRgxbk7fTuFZ011tH4HVfYj/TBf+SaK/7FAKGGUX87Cl9xqeF0g+RISXc5TlBahzCDGbey9bKo3lz7SHGq7X4VyUYWsZvhwfL0h5HOhTD8LSVpzch+GQy8Wl+4s62Sk2RL71kGvTFmAP/2TUs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b=j++u3tXP; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="j++u3tXP" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 4BG6qAF5020773 for ; Mon, 16 Dec 2024 01:59:51 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=facebook; bh=m eDzT4/s9GyG+b4UGf4Pc9/eUyW/IdIjw2Jt7BPCD5E=; b=j++u3tXPib4Cc2Ezq cRsZzKkaGc7FHn9SXOnqhykeca+dQYoJx11VzJynPZIYlo1qcZkmnIMiOx0mBhb0 oKxNbcC5bU7OZDaqAPkw/IzM5+fvRnblJFfakvEZH8VMQexAvs8dDBSWlXD+yOFK JCwBKUlcphS5CvZMXr0nCkBHmg= Received: from maileast.thefacebook.com ([163.114.135.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 43jfa28u1a-7 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 16 Dec 2024 01:59:51 -0800 (PST) Received: from twshared53813.03.ash8.facebook.com (2620:10d:c0a8:1b::8e35) by mail.thefacebook.com (2620:10d:c0a9:6f::8fd4) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Mon, 16 Dec 2024 09:59:30 +0000 Received: by devvm12370.nha0.facebook.com (Postfix, from userid 624418) id 53D5C10A1F711; Mon, 16 Dec 2024 01:59:29 -0800 (PST) From: Wei Lin Guay To: , , , CC: , , , , , Wei Lin Guay Subject: [PATCH 3/4] vfio/pci: Allow MMIO regions to be exported through dma-buf Date: Mon, 16 Dec 2024 01:59:17 -0800 Message-ID: <20241216095920.237117-4-wguay@fb.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241216095920.237117-1-wguay@fb.com> References: <20241216095920.237117-1-wguay@fb.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: kzBwEG4KyvjswUUUY8Bjr6x0pcOEkHCI X-Proofpoint-ORIG-GUID: kzBwEG4KyvjswUUUY8Bjr6x0pcOEkHCI X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Jason Gunthorpe Summary: dma-buf has become a way to safely acquire a handle to non-struct page memory that can still have lifetime controlled by the exporter. Notably RDMA can now import dma-buf FDs and build them into MRs which allows for PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory from PCI device BARs. The patch design loosely follows the pattern in commit db1a8dd916aa ("habanalabs: add support for dma-buf exporter") except this does not support pinning. Instead, this implements what, in the past, we've called a revocable attachment using move. In normal situations the attachment is pinned, as a BAR does not change physical address. However when the VFIO device is closed, or a PCI reset is issued, access to the MMIO memory is revoked. Revoked means that move occurs, but an attempt to immediately re-map the memory will fail. In the reset case a future move will be triggered when MMIO access returns. As both close and reset are under userspace control it is expected that userspace will suspend use of the dma-buf before doing these operations, the revoke is purely for kernel self-defense against a hostile userspace. Signed-off-by: Jason Gunthorpe Signed-off-by: Wei Lin Guay Reviewed-by: Dag Moxnes Reviewed-by: Keith Busch Reviewed-by: Nic Viljoen --- drivers/vfio/pci/Makefile | 1 + drivers/vfio/pci/dma_buf.c | 269 +++++++++++++++++++++++++++++ drivers/vfio/pci/vfio_pci_config.c | 8 +- drivers/vfio/pci/vfio_pci_core.c | 28 ++- drivers/vfio/pci/vfio_pci_priv.h | 23 +++ include/linux/vfio_pci_core.h | 1 + include/uapi/linux/vfio.h | 18 ++ 7 files changed, 340 insertions(+), 8 deletions(-) create mode 100644 drivers/vfio/pci/dma_buf.c -- 2.43.5 diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index cf00c0a7e55c..0cfdc9ede82f 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -2,6 +2,7 @@ vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o +vfio-pci-core-$(CONFIG_DMA_SHARED_BUFFER) += dma_buf.o obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o vfio-pci-y := vfio_pci.o diff --git a/drivers/vfio/pci/dma_buf.c b/drivers/vfio/pci/dma_buf.c new file mode 100644 index 000000000000..fd772b520cd7 --- /dev/null +++ b/drivers/vfio/pci/dma_buf.c @@ -0,0 +1,269 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. + */ +#include +#include +#include + +#include "vfio_pci_priv.h" + +MODULE_IMPORT_NS(DMA_BUF); + +struct vfio_pci_dma_buf { + struct dma_buf *dmabuf; + struct vfio_pci_core_device *vdev; + struct list_head dmabufs_elm; + unsigned int index; + unsigned int orig_nents; + size_t offset; + bool revoked; +}; + +static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attachment) +{ + struct vfio_pci_dma_buf *priv = dmabuf->priv; + int rc; + + rc = pci_p2pdma_distance_many(priv->vdev->pdev, &attachment->dev, 1, + true); + if (rc < 0) + attachment->peer2peer = false; + return 0; +} + +static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment) +{ +} + +static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment) +{ + /* + * Uses the dynamic interface but must always allow for + * dma_buf_move_notify() to do revoke + */ + return -EINVAL; +} + +static struct sg_table * +vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, + enum dma_data_direction dir) +{ + size_t sgl_size = dma_get_max_seg_size(attachment->dev); + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; + struct scatterlist *sgl; + struct sg_table *sgt; + dma_addr_t dma_addr; + unsigned int nents; + size_t offset; + int ret; + + dma_resv_assert_held(priv->dmabuf->resv); + + if (!attachment->peer2peer) + return ERR_PTR(-EPERM); + + if (priv->revoked) + return ERR_PTR(-ENODEV); + + sgt = kzalloc(sizeof(*sgt), GFP_KERNEL); + if (!sgt) + return ERR_PTR(-ENOMEM); + + nents = DIV_ROUND_UP(priv->dmabuf->size, sgl_size); + ret = sg_alloc_table(sgt, nents, GFP_KERNEL); + if (ret) + goto err_kfree_sgt; + + /* + * Since the memory being mapped is a device memory it could never be in + * CPU caches. + */ + dma_addr = dma_map_resource( + attachment->dev, + pci_resource_start(priv->vdev->pdev, priv->index) + + priv->offset, + priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC); + ret = dma_mapping_error(attachment->dev, dma_addr); + if (ret) + goto err_free_sgt; + + /* + * Break the BAR's physical range up into max sized SGL's according to + * the device's requirement. + */ + sgl = sgt->sgl; + for (offset = 0; offset != priv->dmabuf->size;) { + size_t chunk_size = min(priv->dmabuf->size - offset, sgl_size); + + sg_set_page(sgl, NULL, chunk_size, 0); + sg_dma_address(sgl) = dma_addr + offset; + sg_dma_len(sgl) = chunk_size; + sgl = sg_next(sgl); + offset += chunk_size; + } + + /* + * Because we are not going to include a CPU list we want to have some + * chance that other users will detect this by setting the orig_nents to + * 0 and using only nents (length of DMA list) when going over the sgl + */ + priv->orig_nents = sgt->orig_nents; + sgt->orig_nents = 0; + return sgt; + +err_free_sgt: + sg_free_table(sgt); +err_kfree_sgt: + kfree(sgt); + return ERR_PTR(ret); +} + +static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment, + struct sg_table *sgt, + enum dma_data_direction dir) +{ + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; + + sgt->orig_nents = priv->orig_nents; + dma_unmap_resource(attachment->dev, sg_dma_address(sgt->sgl), + priv->dmabuf->size, dir, DMA_ATTR_SKIP_CPU_SYNC); + sg_free_table(sgt); + kfree(sgt); +} + +static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) +{ + struct vfio_pci_dma_buf *priv = dmabuf->priv; + + /* + * Either this or vfio_pci_dma_buf_cleanup() will remove from the list. + * The refcount prevents both. + */ + if (priv->vdev) { + down_write(&priv->vdev->memory_lock); + list_del_init(&priv->dmabufs_elm); + up_write(&priv->vdev->memory_lock); + vfio_device_put_registration(&priv->vdev->vdev); + } + kfree(priv); +} + +static const struct dma_buf_ops vfio_pci_dmabuf_ops = { + .attach = vfio_pci_dma_buf_attach, + .map_dma_buf = vfio_pci_dma_buf_map, + .pin = vfio_pci_dma_buf_pin, + .unpin = vfio_pci_dma_buf_unpin, + .release = vfio_pci_dma_buf_release, + .unmap_dma_buf = vfio_pci_dma_buf_unmap, +}; + +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, + struct vfio_device_feature_dma_buf __user *arg, + size_t argsz) +{ + struct vfio_device_feature_dma_buf get_dma_buf; + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct vfio_pci_dma_buf *priv; + int ret; + + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, + sizeof(get_dma_buf)); + if (ret != 1) + return ret; + + if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf))) + return -EFAULT; + + /* For PCI the region_index is the BAR number like everything else */ + if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX) + return -EINVAL; + + exp_info.ops = &vfio_pci_dmabuf_ops; + exp_info.size = pci_resource_len(vdev->pdev, get_dma_buf.region_index); + if (!exp_info.size) + return -EINVAL; + if (get_dma_buf.offset || get_dma_buf.length) { + if (get_dma_buf.length > exp_info.size || + get_dma_buf.offset >= exp_info.size || + get_dma_buf.length > exp_info.size - get_dma_buf.offset || + get_dma_buf.offset % PAGE_SIZE || + get_dma_buf.length % PAGE_SIZE) + return -EINVAL; + exp_info.size = get_dma_buf.length; + } + exp_info.flags = get_dma_buf.open_flags; + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + INIT_LIST_HEAD(&priv->dmabufs_elm); + priv->offset = get_dma_buf.offset; + priv->index = get_dma_buf.region_index; + + exp_info.priv = priv; + priv->dmabuf = dma_buf_export(&exp_info); + if (IS_ERR(priv->dmabuf)) { + ret = PTR_ERR(priv->dmabuf); + kfree(priv); + return ret; + } + + /* dma_buf_put() now frees priv */ + + down_write(&vdev->memory_lock); + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->revoked = !__vfio_pci_memory_enabled(vdev); + priv->vdev = vdev; + vfio_device_get(&vdev->vdev); + list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); + dma_resv_unlock(priv->dmabuf->resv); + up_write(&vdev->memory_lock); + + /* + * dma_buf_fd() consumes the reference, when the file closes the dmabuf + * will be released. + */ + return dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags); +} + +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) +{ + struct vfio_pci_dma_buf *priv; + struct vfio_pci_dma_buf *tmp; + + lockdep_assert_held_write(&vdev->memory_lock); + + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + if (!dma_buf_try_get(priv->dmabuf)) + continue; + if (priv->revoked != revoked) { + dma_resv_lock(priv->dmabuf->resv, NULL); + priv->revoked = revoked; + dma_buf_move_notify(priv->dmabuf); + dma_resv_unlock(priv->dmabuf->resv); + } + dma_buf_put(priv->dmabuf); + } +} + +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_dma_buf *priv; + struct vfio_pci_dma_buf *tmp; + + down_write(&vdev->memory_lock); + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + if (!dma_buf_try_get(priv->dmabuf)) + continue; + dma_resv_lock(priv->dmabuf->resv, NULL); + list_del_init(&priv->dmabufs_elm); + priv->vdev = NULL; + priv->revoked = true; + dma_buf_move_notify(priv->dmabuf); + dma_resv_unlock(priv->dmabuf->resv); + vfio_device_put_registration(&vdev->vdev); + dma_buf_put(priv->dmabuf); + } + up_write(&vdev->memory_lock); +} diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 97422aafaa7b..c605c5cb0078 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -585,10 +585,12 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos, virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY); new_mem = !!(new_cmd & PCI_COMMAND_MEMORY); - if (!new_mem) + if (!new_mem) { vfio_pci_zap_and_down_write_memory_lock(vdev); - else + vfio_pci_dma_buf_move(vdev, true); + } else { down_write(&vdev->memory_lock); + } /* * If the user is writing mem/io enable (new_mem/io) and we @@ -623,6 +625,8 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos, *virt_cmd &= cpu_to_le16(~mask); *virt_cmd |= cpu_to_le16(new_cmd & mask); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); } diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index ba0ce0075b2f..bb97b4d94eb7 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -700,6 +700,8 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev) #endif vfio_pci_core_disable(vdev); + vfio_pci_dma_buf_cleanup(vdev); + mutex_lock(&vdev->igate); if (vdev->err_trigger) { eventfd_ctx_put(vdev->err_trigger); @@ -1244,7 +1246,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev, */ vfio_pci_set_power_state(vdev, PCI_D0); + vfio_pci_dma_buf_move(vdev, true); ret = pci_try_reset_function(vdev->pdev); + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); up_write(&vdev->memory_lock); return ret; @@ -1490,11 +1495,10 @@ long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd, } EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl); -static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags, - uuid_t __user *arg, size_t argsz) +static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev, + u32 flags, uuid_t __user *arg, + size_t argsz) { - struct vfio_pci_core_device *vdev = - container_of(device, struct vfio_pci_core_device, vdev); uuid_t uuid; int ret; @@ -1521,6 +1525,9 @@ static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags, int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, void __user *arg, size_t argsz) { + struct vfio_pci_core_device *vdev = + container_of(device, struct vfio_pci_core_device, vdev); + switch (flags & VFIO_DEVICE_FEATURE_MASK) { case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY: return vfio_pci_core_pm_entry(device, flags, arg, argsz); @@ -1530,7 +1537,9 @@ int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT: return vfio_pci_core_pm_exit(device, flags, arg, argsz); case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN: - return vfio_pci_core_feature_token(device, flags, arg, argsz); + return vfio_pci_core_feature_token(vdev, flags, arg, argsz); + case VFIO_DEVICE_FEATURE_DMA_BUF: + return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); default: return -ENOTTY; } @@ -2083,6 +2092,7 @@ int vfio_pci_core_init_dev(struct vfio_device *core_vdev) INIT_LIST_HEAD(&vdev->sriov_pfs_item); init_rwsem(&vdev->memory_lock); xa_init(&vdev->ctx); + INIT_LIST_HEAD(&vdev->dmabufs); return 0; } @@ -2463,11 +2473,17 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, * cause the PCI config space reset without restoring the original * state (saved locally in 'vdev->pm_save'). */ - list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) + list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) { + vfio_pci_dma_buf_move(vdev, true); vfio_pci_set_power_state(vdev, PCI_D0); + } ret = pci_reset_bus(pdev); + list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) + if (__vfio_pci_memory_enabled(vdev)) + vfio_pci_dma_buf_move(vdev, false); + vdev = list_last_entry(&dev_set->device_list, struct vfio_pci_core_device, vdev.dev_set_list); diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index 5e4fa69aee16..09d3c300918c 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -101,4 +101,27 @@ static inline bool vfio_pci_is_vga(struct pci_dev *pdev) return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; } +#ifdef CONFIG_DMA_SHARED_BUFFER +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, + struct vfio_device_feature_dma_buf __user *arg, + size_t argsz); +void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked); +#else +static int +vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, + struct vfio_device_feature_dma_buf __user *arg, + size_t argsz) +{ + return -ENOTTY; +} +static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) +{ +} +static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, + bool revoked) +{ +} +#endif + #endif diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index fbb472dd99b3..da5d8955ae56 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -94,6 +94,7 @@ struct vfio_pci_core_device { struct vfio_pci_core_device *sriov_pf_core_dev; struct notifier_block nb; struct rw_semaphore memory_lock; + struct list_head dmabufs; }; /* Will be exported for vfio pci drivers usage */ diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 2b68e6cdf190..8812b4750cc5 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -1458,6 +1458,24 @@ struct vfio_device_feature_bus_master { }; #define VFIO_DEVICE_FEATURE_BUS_MASTER 10 +/** + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the + * region selected. + * + * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC, + * etc. offset/length specify a slice of the region to create the dmabuf from. + * If both are 0 then the whole region is used. + * + * Return: The fd number on success, -1 and errno is set on failure. + */ +struct vfio_device_feature_dma_buf { + __u32 region_index; + __u32 open_flags; + __u32 offset; + __u64 length; +}; +#define VFIO_DEVICE_FEATURE_DMA_BUF 11 + /* -------- API for Type1 VFIO IOMMU -------- */ /** From patchwork Mon Dec 16 09:59:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wei Lin Guay X-Patchwork-Id: 13909493 Received: from mx0a-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 38EC420371A for ; Mon, 16 Dec 2024 09:59:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734343200; cv=none; b=U8yFQGlMOC5NUkYg9MVlo7H+TLexx3Fss6L3ldw/tVmTry1hIl+l+AOiGrwAAJWM8w4+EmcjdxEo6RnP9bFr+GhmOtC05yqbnC5s2/uF9YPSdn4Sv3GXpyCqQNviXlJTDcHkdqHXl81vC3B/LQJR693/VeFQ93iBEtyBSi0lPQE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734343200; c=relaxed/simple; bh=uz4wHCSnWJTNPll4b5euHgZgBe9SVb4ShU51S6mnk5I=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YM/5LRu5p8jWe6NT/dKRSCWvkPeOKiZQ+rYRmDOlYzPhJlwLOdqyuQ/WpXf06NupTcieSha0DbyL6Ak20GDeN5AyP2YILHK+iTwonIfdpr9NYfAhcDTTgyILB3t73jvGc2uakNafiahxDLrx48BpM9qMhZ9GdQ6bkz4gdGGUI2A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b=Z9Fci3BW; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="Z9Fci3BW" Received: from pps.filterd (m0089730.ppops.net [127.0.0.1]) by m0089730.ppops.net (8.18.1.2/8.18.1.2) with ESMTP id 4BG2m1Kk012757 for ; Mon, 16 Dec 2024 01:59:58 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=facebook; bh=F ViO/GG+RK+U6PNdQVar6FXzIzCS7v+deFT5xR57T6M=; b=Z9Fci3BWOQ4aNCmdt WBV/qAj3yh+MnehyABwRPqcgVVfeq8rOy0aaKQQmsOKhX14H4KIAShDEUiEuz4KF rY1wGHoXL+wHQ7rwIlmlvl++6H0aEziVZ6djR7FvlmutU+H3H0R/zmQEik19Hd92 BgEJlO2B3L1nHuBcg0vZuBY550= Received: from mail.thefacebook.com ([163.114.134.16]) by m0089730.ppops.net (PPS) with ESMTPS id 43jbr2hj7u-9 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Mon, 16 Dec 2024 01:59:57 -0800 (PST) Received: from twshared24170.03.ash8.facebook.com (2620:10d:c085:208::7cb7) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1544.11; Mon, 16 Dec 2024 09:59:31 +0000 Received: by devvm12370.nha0.facebook.com (Postfix, from userid 624418) id 3BEC110A1F71A; Mon, 16 Dec 2024 01:59:30 -0800 (PST) From: Wei Lin Guay To: , , , CC: , , , , , Wei Lin Guay Subject: [PATCH 4/4] vfio/pci: Allow export dmabuf without move_notify from importer Date: Mon, 16 Dec 2024 01:59:18 -0800 Message-ID: <20241216095920.237117-5-wguay@fb.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20241216095920.237117-1-wguay@fb.com> References: <20241216095920.237117-1-wguay@fb.com> Precedence: bulk X-Mailing-List: linux-rdma@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-FB-Internal: Safe X-Proofpoint-GUID: xoj7eedl9SEiNDM78c3t7x3WbhNEjJ05 X-Proofpoint-ORIG-GUID: xoj7eedl9SEiNDM78c3t7x3WbhNEjJ05 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1051,Hydra:6.0.680,FMLib:17.12.62.30 definitions=2024-10-05_03,2024-10-04_01,2024-09-30_01 From: Wei Lin Guay Summary: Support vfio to export dmabuf to importer such as RDMA NIC that does not support move_notify callback, since not all RDMA driver support on-demand-paging (ODP). There are some use-cases such as bind accelerator that always pinned the device memory via vfio and export it to RDMA NIC such as EFA, BNXT_RE or IRDMA that does not support ODP. Signed-off-by: Wei Lin Guay Reviewed-by: Dag Moxnes Reviewed-by: Keith Busch Reviewed-by: Nic Viljoen --- drivers/vfio/pci/dma_buf.c | 32 +++++++++++++++++++++++++++----- drivers/vfio/pci/vfio_pci_core.c | 16 ++++++++++++++++ drivers/vfio/pci/vfio_pci_priv.h | 7 +++++++ 3 files changed, 50 insertions(+), 5 deletions(-) -- 2.43.5 diff --git a/drivers/vfio/pci/dma_buf.c b/drivers/vfio/pci/dma_buf.c index fd772b520cd7..8017f48296cb 100644 --- a/drivers/vfio/pci/dma_buf.c +++ b/drivers/vfio/pci/dma_buf.c @@ -17,6 +17,7 @@ struct vfio_pci_dma_buf { unsigned int orig_nents; size_t offset; bool revoked; + bool pinned; }; static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, @@ -32,17 +33,38 @@ static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, return 0; } +bool vfio_pci_dma_buf_pinned(struct vfio_pci_core_device *vdev) +{ + struct vfio_pci_dma_buf *priv; + struct vfio_pci_dma_buf *tmp; + bool pinned = false; + + down_write(&vdev->memory_lock); + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { + if (!dma_buf_try_get(priv->dmabuf)) + continue; + if (priv->pinned) { + pinned = true; + break; + } + } + up_write(&vdev->memory_lock); + return pinned; +} + static void vfio_pci_dma_buf_unpin(struct dma_buf_attachment *attachment) { + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; + + priv->pinned = false; } static int vfio_pci_dma_buf_pin(struct dma_buf_attachment *attachment) { - /* - * Uses the dynamic interface but must always allow for - * dma_buf_move_notify() to do revoke - */ - return -EINVAL; + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; + + priv->pinned = true; + return 0; } static struct sg_table * diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c index bb97b4d94eb7..db28fa2cc9a8 100644 --- a/drivers/vfio/pci/vfio_pci_core.c +++ b/drivers/vfio/pci/vfio_pci_core.c @@ -1246,6 +1246,13 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev, */ vfio_pci_set_power_state(vdev, PCI_D0); + /* + * prevent reset if dma_buf is pinned to avoid stale pinned + * expose to the dmabuf exporter. + */ + if (vfio_pci_dma_buf_pinned(vdev)) + return -EINVAL; + vfio_pci_dma_buf_move(vdev, true); ret = pci_try_reset_function(vdev->pdev); if (__vfio_pci_memory_enabled(vdev)) @@ -2444,6 +2451,15 @@ static int vfio_pci_dev_set_hot_reset(struct vfio_device_set *dev_set, break; } + /* + * prevent reset if dma_buf is pinned to avoid stale pinned + * expose to the dmabuf exporter. + */ + if (vfio_pci_dma_buf_pinned(vdev)) { + ret = -EINVAL; + break; + } + /* * Take the memory write lock for each device and zap BAR * mappings to prevent the user accessing the device while in diff --git a/drivers/vfio/pci/vfio_pci_priv.h b/drivers/vfio/pci/vfio_pci_priv.h index 09d3c300918c..43c40dc4751c 100644 --- a/drivers/vfio/pci/vfio_pci_priv.h +++ b/drivers/vfio/pci/vfio_pci_priv.h @@ -107,6 +107,7 @@ int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, size_t argsz); void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked); +bool vfio_pci_dma_buf_pinned(struct vfio_pci_core_device *vdev); #else static int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, @@ -115,6 +116,12 @@ vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, { return -ENOTTY; } + +static inline bool vfio_pci_dma_buf_pinned(struct vfio_pci_core_device *vdev) +{ + return false; +} + static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) { }