diff mbox series

Supporting VFIO on nVidia's Orin platform

Message ID CACcEcgQq_yxvjAo7BticTw6ne8S2uUjbCFxPTnWHT24oMkxf=w@mail.gmail.com (mailing list archive)
State New, archived
Headers show
Series Supporting VFIO on nVidia's Orin platform | expand

Commit Message

Michael Williamson Sept. 29, 2024, 3:02 p.m. UTC
Hello,

I've been trying to get VFIO working on nVidia's Orin platform (ARM64) in
order to support moving data efficiently off of an attached FPGA PCIe board
using the SMMU from a user space application.  We have a full application
working on x86/x64 boxes that properly support the VFIO interface.  We're
trying to port support to the Orin.

I'm on nVidia's 5.15.36 branch.  It doesn't work out of the box, as the
tegra194-pcie platform controller is lumped in the same iommu group as the
actual PCIe card.  The acs override patch didn't help to separate them.

I have a patch below that *seems* to work for us, but I will admit I do not
know the implications of what I am doing here.

Can anyone let me know if this is (and why it is) a bad idea, and what really
needs to be done?  Or if this is the wrong mailing list, point me in the right
direction?

Thanks,
Mike

 MODULE_PARM_DESC(allow_unsafe_interrupts,
@@ -1733,8 +1734,18 @@ static int vfio_bus_type(struct device *dev, void *data)
 {
        struct bus_type **bus = data;

-       if (*bus && *bus != dev->bus)
+       /**
+        * [MAW] - hack.  the orin tegra194-pcie is in this group and
+        * reports in as bus-type of "platform".  We will ignore it
+        * in an attempt to get vfio to play along.
+        */
+       if (!strcmp(dev->bus->name,"platform")) {
+               return 0;
+       }
+
+       if (*bus && *bus != dev->bus) {
                return -EINVAL;
+       }

        *bus = dev->bus;

Comments

Yi Liu Sept. 30, 2024, 4:04 a.m. UTC | #1
On 2024/9/29 23:02, Michael Williamson wrote:
> Hello,
> 
> I've been trying to get VFIO working on nVidia's Orin platform (ARM64) in
> order to support moving data efficiently off of an attached FPGA PCIe board
> using the SMMU from a user space application.  We have a full application
> working on x86/x64 boxes that properly support the VFIO interface.  We're
> trying to port support to the Orin.
> 
> I'm on nVidia's 5.15.36 branch.  It doesn't work out of the box, as the
> tegra194-pcie platform controller is lumped in the same iommu group as the
> actual PCIe card.  The acs override patch didn't help to separate them.
> 
> I have a patch below that *seems* to work for us, but I will admit I do not
> know the implications of what I am doing here.

your below patch is to pass the vfio_dev_viable() check I suppose. If you
are sure the tegra194-pcie platform controller will not issue DMA, then it
is fine. If not, you should be careful about it.

> Can anyone let me know if this is (and why it is) a bad idea, and what really
> needs to be done?  Or if this is the wrong mailing list, point me in the right
> direction?

this is the right place to ask. +NV folks I know.

> Thanks,
> Mike
> 
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 818e47fc0896..a598a2204781 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -638,8 +638,15 @@ static struct vfio_device
> *vfio_group_get_device(struct vfio_group *group,
>    * breaks anything, it only does so for user owned devices downstream.  Note
>    * that error notification via MSI can be affected for platforms that handle
>    * MSI within the same IOVA space as DMA.
> + *
> + * [MAW] - the tegra194-pcie driver is a platform PCie device controller and
> + * fails the dev_is_pci() check below.  Not sure if it's because its grouping
> + * needs to be reworked, but I don't know how this is (or if it
> should be) done.
> + * This is a hack to see if we can get it going well enough to use the
> + * SMMU from user space.  The other two devices (for the Orin) in the group
> + * are the host bridge and the PCIe card itself.
>    */
> -static const char * const vfio_driver_allowed[] = { "pci-stub" };
> +static const char * const vfio_driver_allowed[] = { "pci-stub",
> "tegra194-pcie" };
> 
>   static bool vfio_dev_driver_allowed(struct device *dev,
>                                      struct device_driver *drv)
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 66bbb125d761..e34fbe17ae1a 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -45,7 +45,8 @@
>   #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
>   #define DRIVER_DESC     "Type1 IOMMU driver for VFIO"
> 
> -static bool allow_unsafe_interrupts;
> +/** [MAW] - hack, need this set for Orin test, not compiled is module
> currently */
> +static bool allow_unsafe_interrupts = true;
>   module_param_named(allow_unsafe_interrupts,
>                     allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
>   MODULE_PARM_DESC(allow_unsafe_interrupts,
> @@ -1733,8 +1734,18 @@ static int vfio_bus_type(struct device *dev, void *data)
>   {
>          struct bus_type **bus = data;
> 
> -       if (*bus && *bus != dev->bus)
> +       /**
> +        * [MAW] - hack.  the orin tegra194-pcie is in this group and
> +        * reports in as bus-type of "platform".  We will ignore it
> +        * in an attempt to get vfio to play along.
> +        */
> +       if (!strcmp(dev->bus->name,"platform")) {
> +               return 0;
> +       }
> +
> +       if (*bus && *bus != dev->bus) {
>                  return -EINVAL;
> +       }
> 
>          *bus = dev->bus;
>
Michael Williamson Sept. 30, 2024, 9:27 p.m. UTC | #2
On Mon, Sep 30, 2024 at 12:00 AM Yi Liu <yi.l.liu@intel.com> wrote:
>
> On 2024/9/29 23:02, Michael Williamson wrote:
> > Hello,
> >
> > I've been trying to get VFIO working on nVidia's Orin platform (ARM64) in
> > order to support moving data efficiently off of an attached FPGA PCIe board
> > using the SMMU from a user space application.  We have a full application
> > working on x86/x64 boxes that properly support the VFIO interface.  We're
> > trying to port support to the Orin.
> >
> > I'm on nVidia's 5.15.36 branch.  It doesn't work out of the box, as the
> > tegra194-pcie platform controller is lumped in the same iommu group as the
> > actual PCIe card.  The acs override patch didn't help to separate them.
> >
> > I have a patch below that *seems* to work for us, but I will admit I do not
> > know the implications of what I am doing here.
>
> your below patch is to pass the vfio_dev_viable() check I suppose. If you

yes.

> are sure the tegra194-pcie platform controller will not issue DMA, then it
> is fine. If not, you should be careful about it.
>

There are multiple tegra194-pcie platform controllers defined in the dts.  This
one only has the one PCIe slot we are working with and nothing else.  There
are other instances of the tegra194-pcie controllers that have control over
other PCIe devices in the system.  There appear to be two main smmu controllers
that support multiple masters.  The tegra194-pcie platform controller tied
to the PCIe bus I want has a unique smmu/iommu master ID but is sharing the main
smmu controller with other devices.  However, they are all assigned to
different iommu groups, so I think this is OK?

> > Can anyone let me know if this is (and why it is) a bad idea, and what really
> > needs to be done?  Or if this is the wrong mailing list, point me in the right
> > direction?
>
> this is the right place to ask. +NV folks I know.
>

I appreciate the insight.  Thank you.

> > Thanks,
> > Mike
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index 818e47fc0896..a598a2204781 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -638,8 +638,15 @@ static struct vfio_device
> > *vfio_group_get_device(struct vfio_group *group,
> >    * breaks anything, it only does so for user owned devices downstream.  Note
> >    * that error notification via MSI can be affected for platforms that handle
> >    * MSI within the same IOVA space as DMA.
> > + *
> > + * [MAW] - the tegra194-pcie driver is a platform PCie device controller and
> > + * fails the dev_is_pci() check below.  Not sure if it's because its grouping
> > + * needs to be reworked, but I don't know how this is (or if it
> > should be) done.
> > + * This is a hack to see if we can get it going well enough to use the
> > + * SMMU from user space.  The other two devices (for the Orin) in the group
> > + * are the host bridge and the PCIe card itself.
> >    */
> > -static const char * const vfio_driver_allowed[] = { "pci-stub" };
> > +static const char * const vfio_driver_allowed[] = { "pci-stub",
> > "tegra194-pcie" };
> >
> >   static bool vfio_dev_driver_allowed(struct device *dev,
> >                                      struct device_driver *drv)
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 66bbb125d761..e34fbe17ae1a 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -45,7 +45,8 @@
> >   #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> >   #define DRIVER_DESC     "Type1 IOMMU driver for VFIO"
> >
> > -static bool allow_unsafe_interrupts;
> > +/** [MAW] - hack, need this set for Orin test, not compiled is module
> > currently */
> > +static bool allow_unsafe_interrupts = true;
> >   module_param_named(allow_unsafe_interrupts,
> >                     allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
> >   MODULE_PARM_DESC(allow_unsafe_interrupts,
> > @@ -1733,8 +1734,18 @@ static int vfio_bus_type(struct device *dev, void *data)
> >   {
> >          struct bus_type **bus = data;
> >
> > -       if (*bus && *bus != dev->bus)
> > +       /**
> > +        * [MAW] - hack.  the orin tegra194-pcie is in this group and
> > +        * reports in as bus-type of "platform".  We will ignore it
> > +        * in an attempt to get vfio to play along.
> > +        */
> > +       if (!strcmp(dev->bus->name,"platform")) {
> > +               return 0;
> > +       }
> > +
> > +       if (*bus && *bus != dev->bus) {
> >                  return -EINVAL;
> > +       }
> >
> >          *bus = dev->bus;
> >
>
> --
> Regards,
> Yi Liu
Nicolin Chen Sept. 30, 2024, 9:47 p.m. UTC | #3
On Mon, Sep 30, 2024 at 05:27:33PM -0400, Michael Williamson wrote:
> On Mon, Sep 30, 2024 at 12:00 AM Yi Liu <yi.l.liu@intel.com> wrote:
> >
> > On 2024/9/29 23:02, Michael Williamson wrote:
> > > Hello,
> > >
> > > I've been trying to get VFIO working on nVidia's Orin platform (ARM64) in
> > > order to support moving data efficiently off of an attached FPGA PCIe board
> > > using the SMMU from a user space application.  We have a full application
> > > working on x86/x64 boxes that properly support the VFIO interface.  We're
> > > trying to port support to the Orin.
> > >
> > > I'm on nVidia's 5.15.36 branch.  It doesn't work out of the box, as the
> > > tegra194-pcie platform controller is lumped in the same iommu group as the
> > > actual PCIe card.  The acs override patch didn't help to separate them.
> > >
> > > I have a patch below that *seems* to work for us, but I will admit I do not
> > > know the implications of what I am doing here.
> >
> > your below patch is to pass the vfio_dev_viable() check I suppose. If you
> 
> yes.
> 
> > are sure the tegra194-pcie platform controller will not issue DMA, then it
> > is fine. If not, you should be careful about it.
> >
> 
> There are multiple tegra194-pcie platform controllers defined in the dts.  This
> one only has the one PCIe slot we are working with and nothing else.  There
> are other instances of the tegra194-pcie controllers that have control over
> other PCIe devices in the system.  There appear to be two main smmu controllers
> that support multiple masters.  The tegra194-pcie platform controller tied
> to the PCIe bus I want has a unique smmu/iommu master ID but is sharing the main
> smmu controller with other devices.  However, they are all assigned to
> different iommu groups, so I think this is OK?
> 
> > > Can anyone let me know if this is (and why it is) a bad idea, and what really
> > > needs to be done?  Or if this is the wrong mailing list, point me in the right
> > > direction?
> >
> > this is the right place to ask. +NV folks I know.
> >
> 
> I appreciate the insight.  Thank you.

I am not quite sure about tegra194-pcie. +Vidya who's the owner.

Nicolin

> > > Thanks,
> > > Mike
> > >
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > > index 818e47fc0896..a598a2204781 100644
> > > --- a/drivers/vfio/vfio.c
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -638,8 +638,15 @@ static struct vfio_device
> > > *vfio_group_get_device(struct vfio_group *group,
> > >    * breaks anything, it only does so for user owned devices downstream.  Note
> > >    * that error notification via MSI can be affected for platforms that handle
> > >    * MSI within the same IOVA space as DMA.
> > > + *
> > > + * [MAW] - the tegra194-pcie driver is a platform PCie device controller and
> > > + * fails the dev_is_pci() check below.  Not sure if it's because its grouping
> > > + * needs to be reworked, but I don't know how this is (or if it
> > > should be) done.
> > > + * This is a hack to see if we can get it going well enough to use the
> > > + * SMMU from user space.  The other two devices (for the Orin) in the group
> > > + * are the host bridge and the PCIe card itself.
> > >    */
> > > -static const char * const vfio_driver_allowed[] = { "pci-stub" };
> > > +static const char * const vfio_driver_allowed[] = { "pci-stub",
> > > "tegra194-pcie" };
> > >
> > >   static bool vfio_dev_driver_allowed(struct device *dev,
> > >                                      struct device_driver *drv)
> > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > > index 66bbb125d761..e34fbe17ae1a 100644
> > > --- a/drivers/vfio/vfio_iommu_type1.c
> > > +++ b/drivers/vfio/vfio_iommu_type1.c
> > > @@ -45,7 +45,8 @@
> > >   #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
> > >   #define DRIVER_DESC     "Type1 IOMMU driver for VFIO"
> > >
> > > -static bool allow_unsafe_interrupts;
> > > +/** [MAW] - hack, need this set for Orin test, not compiled is module
> > > currently */
> > > +static bool allow_unsafe_interrupts = true;
> > >   module_param_named(allow_unsafe_interrupts,
> > >                     allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);
> > >   MODULE_PARM_DESC(allow_unsafe_interrupts,
> > > @@ -1733,8 +1734,18 @@ static int vfio_bus_type(struct device *dev, void *data)
> > >   {
> > >          struct bus_type **bus = data;
> > >
> > > -       if (*bus && *bus != dev->bus)
> > > +       /**
> > > +        * [MAW] - hack.  the orin tegra194-pcie is in this group and
> > > +        * reports in as bus-type of "platform".  We will ignore it
> > > +        * in an attempt to get vfio to play along.
> > > +        */
> > > +       if (!strcmp(dev->bus->name,"platform")) {
> > > +               return 0;
> > > +       }
> > > +
> > > +       if (*bus && *bus != dev->bus) {
> > >                  return -EINVAL;
> > > +       }
> > >
> > >          *bus = dev->bus;
> > >
> >
> > --
> > Regards,
> > Yi Liu
> 
> 
> 
> --
> Mike Williamson
> OpenPGP Public Key
diff mbox series

Patch

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 818e47fc0896..a598a2204781 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -638,8 +638,15 @@  static struct vfio_device
*vfio_group_get_device(struct vfio_group *group,
  * breaks anything, it only does so for user owned devices downstream.  Note
  * that error notification via MSI can be affected for platforms that handle
  * MSI within the same IOVA space as DMA.
+ *
+ * [MAW] - the tegra194-pcie driver is a platform PCie device controller and
+ * fails the dev_is_pci() check below.  Not sure if it's because its grouping
+ * needs to be reworked, but I don't know how this is (or if it
should be) done.
+ * This is a hack to see if we can get it going well enough to use the
+ * SMMU from user space.  The other two devices (for the Orin) in the group
+ * are the host bridge and the PCIe card itself.
  */
-static const char * const vfio_driver_allowed[] = { "pci-stub" };
+static const char * const vfio_driver_allowed[] = { "pci-stub",
"tegra194-pcie" };

 static bool vfio_dev_driver_allowed(struct device *dev,
                                    struct device_driver *drv)
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 66bbb125d761..e34fbe17ae1a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -45,7 +45,8 @@ 
 #define DRIVER_AUTHOR   "Alex Williamson <alex.williamson@redhat.com>"
 #define DRIVER_DESC     "Type1 IOMMU driver for VFIO"

-static bool allow_unsafe_interrupts;
+/** [MAW] - hack, need this set for Orin test, not compiled is module
currently */
+static bool allow_unsafe_interrupts = true;
 module_param_named(allow_unsafe_interrupts,
                   allow_unsafe_interrupts, bool, S_IRUGO | S_IWUSR);