[0/4] Allow MMIO regions to be exported through dma-buf

Message ID	0-v1-9e6e1739ed95+5fa-vfio_dma_buf_jgg@nvidia.com (mailing list archive)
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Jason Gunthorpe <jgg@nvidia.com> To: Alex Williamson <alex.williamson@redhat.com>, =?utf-8?q?Christian_K?= =?utf-8?q?=C3=B6nig?= <christian.koenig@amd.com>, Cornelia Huck <cohuck@redhat.com>, dri-devel@lists.freedesktop.org, kvm@vger.kernel.org, linaro-mm-sig@lists.linaro.org, linux-media@vger.kernel.org, Sumit Semwal <sumit.semwal@linaro.org> Subject: [PATCH 0/4] Allow MMIO regions to be exported through dma-buf Date: Wed, 17 Aug 2022 13:11:38 -0300 Message-Id: <0-v1-9e6e1739ed95+5fa-vfio_dma_buf_jgg@nvidia.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain MIME-Version: 1.0 Precedence: list Cc: linux-rdma@vger.kernel.org, Daniel Vetter <daniel.vetter@ffwll.ch>, Oded Gabbay <ogabbay@kernel.org>, Maor Gottlieb <maorg@nvidia.com>, Leon Romanovsky <leon@kernel.org> Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	Allow MMIO regions to be exported through dma-buf \| expand [0/4] Allow MMIO regions to be exported through dma-buf [1/4] dma-buf: Add dma_buf_try_get() [2/4] vfio: Add vfio_device_get() [3/4] vfio_pci: Do not open code pci_try_reset_function() [4/4] vfio/pci: Allow MMIO regions to be exported through dma-buf

Jason Gunthorpe Aug. 17, 2022, 4:11 p.m. UTC

dma-buf has become a way to safely acquire a handle to non-struct page
memory that can still have lifetime controlled by the exporter. Notably
RDMA can now import dma-buf FDs and build them into MRs which allows for
PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
from PCI device BARs.

This series supports a use case for SPDK where a NVMe device will be owned
by SPDK through VFIO but interacting with a RDMA device. The RDMA device
may directly access the NVMe CMB or directly manipulate the NVMe device's
doorbell using PCI P2P.

However, as a general mechanism, it can support many other scenarios with
VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
generic and safe P2P mappings.

This series goes after the "Break up ioctl dispatch functions to one
function per ioctl" series.

This is on github: https://github.com/jgunthorpe/linux/commits/vfio_dma_buf

Jason Gunthorpe (4):
  dma-buf: Add dma_buf_try_get()
  vfio: Add vfio_device_get()
  vfio_pci: Do not open code pci_try_reset_function()
  vfio/pci: Allow MMIO regions to be exported through dma-buf

 drivers/vfio/pci/Makefile           |   1 +
 drivers/vfio/pci/vfio_pci_config.c  |  22 ++-
 drivers/vfio/pci/vfio_pci_core.c    |  33 +++-
 drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++
 drivers/vfio/pci/vfio_pci_priv.h    |  24 +++
 drivers/vfio/vfio_main.c            |   3 +-
 include/linux/dma-buf.h             |  13 ++
 include/linux/vfio.h                |   6 +
 include/linux/vfio_pci_core.h       |   1 +
 include/uapi/linux/vfio.h           |  18 ++
 10 files changed, 364 insertions(+), 22 deletions(-)
 create mode 100644 drivers/vfio/pci/vfio_pci_dma_buf.c


base-commit: 385f0411fcd2780b5273992832cdc8edcd5b8ea9

Christian König Aug. 18, 2022, 11:07 a.m. UTC | #1

Am 17.08.22 um 18:11 schrieb Jason Gunthorpe:
> dma-buf has become a way to safely acquire a handle to non-struct page
> memory that can still have lifetime controlled by the exporter. Notably
> RDMA can now import dma-buf FDs and build them into MRs which allows for
> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> from PCI device BARs.
>
> This series supports a use case for SPDK where a NVMe device will be owned
> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> may directly access the NVMe CMB or directly manipulate the NVMe device's
> doorbell using PCI P2P.
>
> However, as a general mechanism, it can support many other scenarios with
> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> generic and safe P2P mappings.

In general looks good to me, but we really need to get away from using 
sg_tables for this here.

The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen 
this incorrectly used so many times that I can't count them any more.

Would that be somehow avoidable? Or could you at least explain the use 
case a bit better.

Thanks,
Christian.

>
> This series goes after the "Break up ioctl dispatch functions to one
> function per ioctl" series.
>
> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_dma_buf
>
> Jason Gunthorpe (4):
>    dma-buf: Add dma_buf_try_get()
>    vfio: Add vfio_device_get()
>    vfio_pci: Do not open code pci_try_reset_function()
>    vfio/pci: Allow MMIO regions to be exported through dma-buf
>
>   drivers/vfio/pci/Makefile           |   1 +
>   drivers/vfio/pci/vfio_pci_config.c  |  22 ++-
>   drivers/vfio/pci/vfio_pci_core.c    |  33 +++-
>   drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++
>   drivers/vfio/pci/vfio_pci_priv.h    |  24 +++
>   drivers/vfio/vfio_main.c            |   3 +-
>   include/linux/dma-buf.h             |  13 ++
>   include/linux/vfio.h                |   6 +
>   include/linux/vfio_pci_core.h       |   1 +
>   include/uapi/linux/vfio.h           |  18 ++
>   10 files changed, 364 insertions(+), 22 deletions(-)
>   create mode 100644 drivers/vfio/pci/vfio_pci_dma_buf.c
>
>
> base-commit: 385f0411fcd2780b5273992832cdc8edcd5b8ea9

Jason Gunthorpe Aug. 18, 2022, 12:03 p.m. UTC | #2

On Thu, Aug 18, 2022 at 01:07:16PM +0200, Christian König wrote:
> Am 17.08.22 um 18:11 schrieb Jason Gunthorpe:
> > dma-buf has become a way to safely acquire a handle to non-struct page
> > memory that can still have lifetime controlled by the exporter. Notably
> > RDMA can now import dma-buf FDs and build them into MRs which allows for
> > PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> > from PCI device BARs.
> > 
> > This series supports a use case for SPDK where a NVMe device will be owned
> > by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> > may directly access the NVMe CMB or directly manipulate the NVMe device's
> > doorbell using PCI P2P.
> > 
> > However, as a general mechanism, it can support many other scenarios with
> > VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> > generic and safe P2P mappings.
> 
> In general looks good to me, but we really need to get away from using
> sg_tables for this here.
> 
> The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
> this incorrectly used so many times that I can't count them any more.
> 
> Would that be somehow avoidable? Or could you at least explain the use case
> a bit better.

I didn't see a way, maybe you know of one

VFIO needs to maintain a list of dmabuf FDs that have been created by
the user attached to each vfio_device:

int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
				  struct vfio_device_feature_dma_buf __user *arg,
				  size_t argsz)
{
	down_write(&vdev->memory_lock);
	list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
	up_write(&vdev->memory_lock);

And dmabuf FD's are removed from the list when the user closes the FD:

static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
{
		down_write(&priv->vdev->memory_lock);
		list_del_init(&priv->dmabufs_elm);
		up_write(&priv->vdev->memory_lock);

Which then poses the problem: How do you iterate over only dma_buf's
that are still alive to execute move?

This seems necessary as parts of the dma_buf have already been
destroyed by the time the user's release function is called.

Which I solved like this:

	down_write(&vdev->memory_lock);
	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
		if (!dma_buf_try_get(priv->dmabuf))
			continue;

So the scenarios resolve as:
 - Concurrent release is not in progress: dma_buf_try_get() succeeds
   and prevents concurrent release from starting
 - Release has started but not reached its memory_lock:
   dma_buf_try_get() fails
 - Release has started but passed its memory_lock: dmabuf is not on
   the list so dma_buf_try_get() is not called.

Jason

Jason Gunthorpe Aug. 18, 2022, 12:05 p.m. UTC | #3

On Wed, Aug 17, 2022 at 01:11:38PM -0300, Jason Gunthorpe wrote:
> dma-buf has become a way to safely acquire a handle to non-struct page
> memory that can still have lifetime controlled by the exporter. Notably
> RDMA can now import dma-buf FDs and build them into MRs which allows for
> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> from PCI device BARs.
> 
> This series supports a use case for SPDK where a NVMe device will be owned
> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> may directly access the NVMe CMB or directly manipulate the NVMe device's
> doorbell using PCI P2P.
> 
> However, as a general mechanism, it can support many other scenarios with
> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> generic and safe P2P mappings.
> 
> This series goes after the "Break up ioctl dispatch functions to one
> function per ioctl" series.
> 
> This is on github: https://github.com/jgunthorpe/linux/commits/vfio_dma_buf
> 
> Jason Gunthorpe (4):
>   dma-buf: Add dma_buf_try_get()
>   vfio: Add vfio_device_get()
>   vfio_pci: Do not open code pci_try_reset_function()
>   vfio/pci: Allow MMIO regions to be exported through dma-buf
> 
>  drivers/vfio/pci/Makefile           |   1 +
>  drivers/vfio/pci/vfio_pci_config.c  |  22 ++-
>  drivers/vfio/pci/vfio_pci_core.c    |  33 +++-
>  drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++

I forget about this..

Alex, do you want to start doing as Linus discused and I will rename
this new file to "dma_buf.c" ?

Or keep this directory as having the vfio_pci_* prefix for
consistency?

Jason

Christian König Aug. 18, 2022, 12:58 p.m. UTC | #4

Am 18.08.22 um 14:03 schrieb Jason Gunthorpe:
> On Thu, Aug 18, 2022 at 01:07:16PM +0200, Christian König wrote:
>> Am 17.08.22 um 18:11 schrieb Jason Gunthorpe:
>>> dma-buf has become a way to safely acquire a handle to non-struct page
>>> memory that can still have lifetime controlled by the exporter. Notably
>>> RDMA can now import dma-buf FDs and build them into MRs which allows for
>>> PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
>>> from PCI device BARs.
>>>
>>> This series supports a use case for SPDK where a NVMe device will be owned
>>> by SPDK through VFIO but interacting with a RDMA device. The RDMA device
>>> may directly access the NVMe CMB or directly manipulate the NVMe device's
>>> doorbell using PCI P2P.
>>>
>>> However, as a general mechanism, it can support many other scenarios with
>>> VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
>>> generic and safe P2P mappings.
>> In general looks good to me, but we really need to get away from using
>> sg_tables for this here.
>>
>> The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
>> this incorrectly used so many times that I can't count them any more.
>>
>> Would that be somehow avoidable? Or could you at least explain the use case
>> a bit better.
> I didn't see a way, maybe you know of one

For GEM objects we usually don't use the reference count of the DMA-buf, 
but rather that of the GEM object for this. But that's not an ideal 
solution either.

>
> VFIO needs to maintain a list of dmabuf FDs that have been created by
> the user attached to each vfio_device:
>
> int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags,
> 				  struct vfio_device_feature_dma_buf __user *arg,
> 				  size_t argsz)
> {
> 	down_write(&vdev->memory_lock);
> 	list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs);
> 	up_write(&vdev->memory_lock);
>
> And dmabuf FD's are removed from the list when the user closes the FD:
>
> static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf)
> {
> 		down_write(&priv->vdev->memory_lock);
> 		list_del_init(&priv->dmabufs_elm);
> 		up_write(&priv->vdev->memory_lock);
>
> Which then poses the problem: How do you iterate over only dma_buf's
> that are still alive to execute move?
>
> This seems necessary as parts of the dma_buf have already been
> destroyed by the time the user's release function is called.
>
> Which I solved like this:
>
> 	down_write(&vdev->memory_lock);
> 	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> 		if (!dma_buf_try_get(priv->dmabuf))
> 			continue;

What would happen if you don't skip destroyed dma-bufs here? In other 
words why do you maintain that list in the first place?

Regards,
Christian.

>
> So the scenarios resolve as:
>   - Concurrent release is not in progress: dma_buf_try_get() succeeds
>     and prevents concurrent release from starting
>   - Release has started but not reached its memory_lock:
>     dma_buf_try_get() fails
>   - Release has started but passed its memory_lock: dmabuf is not on
>     the list so dma_buf_try_get() is not called.
>
> Jason

Jason Gunthorpe Aug. 18, 2022, 1:16 p.m. UTC | #5

On Thu, Aug 18, 2022 at 02:58:10PM +0200, Christian König wrote:

> > > The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
> > > this incorrectly used so many times that I can't count them any more.
> > > 
> > > Would that be somehow avoidable? Or could you at least explain the use case
> > > a bit better.
> > I didn't see a way, maybe you know of one
> 
> For GEM objects we usually don't use the reference count of the DMA-buf, but
> rather that of the GEM object for this. But that's not an ideal solution
> either.

You can't really ignore the dmabuf refcount. At some point you have to
deal with the dmabuf being asynchronously released by userspace.

> > 	down_write(&vdev->memory_lock);
> > 	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
> > 		if (!dma_buf_try_get(priv->dmabuf))
> > 			continue;
> 
> What would happen if you don't skip destroyed dma-bufs here? In other words
> why do you maintain that list in the first place?

The list is to keep track of the dmabufs that were created, it is not
optional.

The only question is what do you do about invoking
dma_buf_move_notify() on a dmabuf that is already undergoing
destruction.

For instance undergoing destruction means the dmabuf core has already
done this:

	mutex_lock(&db_list.lock);
	list_del(&dmabuf->list_node);
	mutex_unlock(&db_list.lock);
	dma_buf_stats_teardown(dmabuf);

So it seems non-ideal to continue to use it.

However, dma_buf_move_notify() specifically has no issue with that level of
destruction since it only does

	list_for_each_entry(attach, &dmabuf->attachments, node)

And attachments must be empty if the file refcount is zero.

So we could delete the try_buf and just rely on move being safe on
partially destroyed dma_buf's as part of the API design.

Jason

Christian König Aug. 18, 2022, 1:37 p.m. UTC | #6

Am 18.08.22 um 15:16 schrieb Jason Gunthorpe:
> On Thu, Aug 18, 2022 at 02:58:10PM +0200, Christian König wrote:
>
>>>> The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
>>>> this incorrectly used so many times that I can't count them any more.
>>>>
>>>> Would that be somehow avoidable? Or could you at least explain the use case
>>>> a bit better.
>>> I didn't see a way, maybe you know of one
>> For GEM objects we usually don't use the reference count of the DMA-buf, but
>> rather that of the GEM object for this. But that's not an ideal solution
>> either.
> You can't really ignore the dmabuf refcount. At some point you have to
> deal with the dmabuf being asynchronously released by userspace.

Yeah, but in this case the dma-buf is just a reference to the 
real/private object which holds the backing store.

When the dma-buf is released you drop the real object reference and from 
your driver internals you only try_get only the real object.

The advantage is that only your driver can use the try_get function and 
not some importing driver which doesn't know about the internals of the 
exporter.

We just had to many cases where developers weren't sure if a pointer is 
still valid and by using try_get it just "magically" got working (well I 
have to admit it made the crashing less likely....).

>>> 	down_write(&vdev->memory_lock);
>>> 	list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) {
>>> 		if (!dma_buf_try_get(priv->dmabuf))
>>> 			continue;
>> What would happen if you don't skip destroyed dma-bufs here? In other words
>> why do you maintain that list in the first place?
> The list is to keep track of the dmabufs that were created, it is not
> optional.
>
> The only question is what do you do about invoking
> dma_buf_move_notify() on a dmabuf that is already undergoing
> destruction.

Ah, yes. Really good point.

>
> For instance undergoing destruction means the dmabuf core has already
> done this:
>
> 	mutex_lock(&db_list.lock);
> 	list_del(&dmabuf->list_node);
> 	mutex_unlock(&db_list.lock);
> 	dma_buf_stats_teardown(dmabuf);
>
> So it seems non-ideal to continue to use it.
>
> However, dma_buf_move_notify() specifically has no issue with that level of
> destruction since it only does
>
> 	list_for_each_entry(attach, &dmabuf->attachments, node)
>
> And attachments must be empty if the file refcount is zero.
>
> So we could delete the try_buf and just rely on move being safe on
> partially destroyed dma_buf's as part of the API design.

I think that might be the more defensive approach. A comment on the 
dma_buf_move_notify() function should probably be a good idea.

Thanks,
Christian.

>
> Jason

Jason Gunthorpe Aug. 19, 2022, 1:11 p.m. UTC | #7

On Thu, Aug 18, 2022 at 03:37:01PM +0200, Christian König wrote:
> Am 18.08.22 um 15:16 schrieb Jason Gunthorpe:
> > On Thu, Aug 18, 2022 at 02:58:10PM +0200, Christian König wrote:
> > 
> > > > > The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
> > > > > this incorrectly used so many times that I can't count them any more.
> > > > > 
> > > > > Would that be somehow avoidable? Or could you at least explain the use case
> > > > > a bit better.
> > > > I didn't see a way, maybe you know of one
> > > For GEM objects we usually don't use the reference count of the DMA-buf, but
> > > rather that of the GEM object for this. But that's not an ideal solution
> > > either.
> > You can't really ignore the dmabuf refcount. At some point you have to
> > deal with the dmabuf being asynchronously released by userspace.
> 
> Yeah, but in this case the dma-buf is just a reference to the real/private
> object which holds the backing store.

The gem approach is backwards to what I did here.

GEM holds a singleton pointer to the dmabuf and holds a reference on
it as long as it has the pointer. This means the dmabuf can not be
freed until the GEM object is freed.

For this I held a "weak reference" on the dmabuf in a list, and we
convert the weak reference to a strong reference in the usual way
using a try_get.

The reason it is different is because the VFIO interface allows
creating a DMABUF with unique parameters on every user request. Eg the
user can select a BAR index and a slice of the MMIO space unique to
each each request and this results in a unique DMABUF.

Due to this we have to store a list of DMABUFs and we need the
DMABUF's to clean up their memory when the user closes the file.

> > So we could delete the try_buf and just rely on move being safe on
> > partially destroyed dma_buf's as part of the API design.
> 
> I think that might be the more defensive approach. A comment on the
> dma_buf_move_notify() function should probably be a good idea.

IMHO, it is an anti-pattern. The caller should hold a strong reference
on an object before invoking any API surface. Upgrading a weak
reference to a strong reference requires the standard "try get" API.

But if you feel strongly I don't mind dropping the try_get around move.

Jason

Christian König Aug. 19, 2022, 1:33 p.m. UTC | #8

Am 19.08.22 um 15:11 schrieb Jason Gunthorpe:
> On Thu, Aug 18, 2022 at 03:37:01PM +0200, Christian König wrote:
>> Am 18.08.22 um 15:16 schrieb Jason Gunthorpe:
>>> On Thu, Aug 18, 2022 at 02:58:10PM +0200, Christian König wrote:
>>>
>>>>>> The only thing I'm not 100% convinced of is dma_buf_try_get(), I've seen
>>>>>> this incorrectly used so many times that I can't count them any more.
>>>>>>
>>>>>> Would that be somehow avoidable? Or could you at least explain the use case
>>>>>> a bit better.
>>>>> I didn't see a way, maybe you know of one
>>>> For GEM objects we usually don't use the reference count of the DMA-buf, but
>>>> rather that of the GEM object for this. But that's not an ideal solution
>>>> either.
>>> You can't really ignore the dmabuf refcount. At some point you have to
>>> deal with the dmabuf being asynchronously released by userspace.
>> Yeah, but in this case the dma-buf is just a reference to the real/private
>> object which holds the backing store.
> The gem approach is backwards to what I did here.

As I said, what GEM does is not necessary the best approach either.

> GEM holds a singleton pointer to the dmabuf and holds a reference on
> it as long as it has the pointer. This means the dmabuf can not be
> freed until the GEM object is freed.
>
> For this I held a "weak reference" on the dmabuf in a list, and we
> convert the weak reference to a strong reference in the usual way
> using a try_get.
>
> The reason it is different is because the VFIO interface allows
> creating a DMABUF with unique parameters on every user request. Eg the
> user can select a BAR index and a slice of the MMIO space unique to
> each each request and this results in a unique DMABUF.
>
> Due to this we have to store a list of DMABUFs and we need the
> DMABUF's to clean up their memory when the user closes the file.

Yeah, that makes sense.

>>> So we could delete the try_buf and just rely on move being safe on
>>> partially destroyed dma_buf's as part of the API design.
>> I think that might be the more defensive approach. A comment on the
>> dma_buf_move_notify() function should probably be a good idea.
> IMHO, it is an anti-pattern. The caller should hold a strong reference
> on an object before invoking any API surface. Upgrading a weak
> reference to a strong reference requires the standard "try get" API.
>
> But if you feel strongly I don't mind dropping the try_get around move.

Well I see it as well that both approaches are not ideal, but my gut 
feeling tells me that just documenting that dma_buf_move_notify() can 
still be called as long as the release callback wasn't called yet is 
probably the better approach.

On the other hand this is really just a gut feeling without strong 
arguments backing it. So if somebody has an argument which makes try_get 
necessary I'm happy to hear it.

Regards,
Christian.

>
> Jason

Jason Gunthorpe Aug. 19, 2022, 1:39 p.m. UTC | #9

On Fri, Aug 19, 2022 at 03:33:04PM +0200, Christian König wrote:

> > > > So we could delete the try_buf and just rely on move being safe on
> > > > partially destroyed dma_buf's as part of the API design.
> > > I think that might be the more defensive approach. A comment on the
> > > dma_buf_move_notify() function should probably be a good idea.
> > IMHO, it is an anti-pattern. The caller should hold a strong reference
> > on an object before invoking any API surface. Upgrading a weak
> > reference to a strong reference requires the standard "try get" API.
> > 
> > But if you feel strongly I don't mind dropping the try_get around move.
> 
> Well I see it as well that both approaches are not ideal, but my gut feeling
> tells me that just documenting that dma_buf_move_notify() can still be
> called as long as the release callback wasn't called yet is probably the
> better approach.

The comment would say something like:

 "dma_resv_lock(), dma_buf_move_notify(), dma_resv_unlock() may be
  called with a 0 refcount so long as ops->release() hasn't returned"

Which is a really abnormal API design, IMHO.

Jason

Christian König Aug. 19, 2022, 1:47 p.m. UTC | #10

Am 19.08.22 um 15:39 schrieb Jason Gunthorpe:
> On Fri, Aug 19, 2022 at 03:33:04PM +0200, Christian König wrote:
>
>>>>> So we could delete the try_buf and just rely on move being safe on
>>>>> partially destroyed dma_buf's as part of the API design.
>>>> I think that might be the more defensive approach. A comment on the
>>>> dma_buf_move_notify() function should probably be a good idea.
>>> IMHO, it is an anti-pattern. The caller should hold a strong reference
>>> on an object before invoking any API surface. Upgrading a weak
>>> reference to a strong reference requires the standard "try get" API.
>>>
>>> But if you feel strongly I don't mind dropping the try_get around move.
>> Well I see it as well that both approaches are not ideal, but my gut feeling
>> tells me that just documenting that dma_buf_move_notify() can still be
>> called as long as the release callback wasn't called yet is probably the
>> better approach.
> The comment would say something like:
>
>   "dma_resv_lock(), dma_buf_move_notify(), dma_resv_unlock() may be
>    called with a 0 refcount so long as ops->release() hasn't returned"
>
> Which is a really abnormal API design, IMHO.

Mhm, Daniel or other do you have any opinion on that as well?

Thanks,
Christian.

>
> Jason

Alex Williamson Aug. 22, 2022, 9:58 p.m. UTC | #11

On Thu, 18 Aug 2022 09:05:24 -0300
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Wed, Aug 17, 2022 at 01:11:38PM -0300, Jason Gunthorpe wrote:
> > dma-buf has become a way to safely acquire a handle to non-struct page
> > memory that can still have lifetime controlled by the exporter. Notably
> > RDMA can now import dma-buf FDs and build them into MRs which allows for
> > PCI P2P operations. Extend this to allow vfio-pci to export MMIO memory
> > from PCI device BARs.
> > 
> > This series supports a use case for SPDK where a NVMe device will be owned
> > by SPDK through VFIO but interacting with a RDMA device. The RDMA device
> > may directly access the NVMe CMB or directly manipulate the NVMe device's
> > doorbell using PCI P2P.
> > 
> > However, as a general mechanism, it can support many other scenarios with
> > VFIO. I imagine this dmabuf approach to be usable by iommufd as well for
> > generic and safe P2P mappings.
> > 
> > This series goes after the "Break up ioctl dispatch functions to one
> > function per ioctl" series.
> > 
> > This is on github: https://github.com/jgunthorpe/linux/commits/vfio_dma_buf
> > 
> > Jason Gunthorpe (4):
> >   dma-buf: Add dma_buf_try_get()
> >   vfio: Add vfio_device_get()
> >   vfio_pci: Do not open code pci_try_reset_function()
> >   vfio/pci: Allow MMIO regions to be exported through dma-buf
> > 
> >  drivers/vfio/pci/Makefile           |   1 +
> >  drivers/vfio/pci/vfio_pci_config.c  |  22 ++-
> >  drivers/vfio/pci/vfio_pci_core.c    |  33 +++-
> >  drivers/vfio/pci/vfio_pci_dma_buf.c | 265 ++++++++++++++++++++++++++++  
> 
> I forget about this..
> 
> Alex, do you want to start doing as Linus discused and I will rename
> this new file to "dma_buf.c" ?
> 
> Or keep this directory as having the vfio_pci_* prefix for
> consistency?

I have a hard time generating a strong opinion over file name
redundancy relative to directory structure.  By my count, over 17% of
files in drivers/ have some file name redundancy to their parent
directory structure (up to two levels).  I see we already have two
$FOO_dma_buf.c files in the tree, virtio and amdgpu among these.  In
the virtio case this is somewhat justified, to me at least, as the
virtio_dma_buf.h file exists in a shared include namespace.  However,
this justification only accounts for about 1% of such files, for many
others the redundancy exists in the include path as well.

I guess if we don't have a reason other than naming consistency and
accept an end goal to incrementally remove file name vs directory
structure redundancy where it makes sense, sure, name it dma_buf.c.
Ugh. Thanks,

Alex

[0/4] Allow MMIO regions to be exported through dma-buf

Message

Comments