diff mbox series

[mlx5-next,1/7] PCI/IOV: Provide internal VF index

Message ID 8d5bba9a6a1067989c3291fa2929528578812334.1632305919.git.leonro@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Add mlx5 live migration driver | expand

Commit Message

Leon Romanovsky Sept. 22, 2021, 10:38 a.m. UTC
From: Jason Gunthorpe <jgg@nvidia.com>

The PCI core uses the VF index internally, often called the vf_id,
during the setup of the VF, eg pci_iov_add_virtfn().

This index is needed for device drivers that implement live migration
for their internal operations that configure/control their VFs.

Specifically, mlx5_vfio_pci driver that is introduced in coming patches
from this series needs it and not the bus/device/function which is
exposed today.

Add pci_iov_vf_id() which computes the vf_id by reversing the math that
was used to create the bus/device/function.

Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
 drivers/pci/iov.c   | 14 ++++++++++++++
 include/linux/pci.h |  7 ++++++-
 2 files changed, 20 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Sept. 22, 2021, 9:59 p.m. UTC | #1
On Wed, Sep 22, 2021 at 01:38:50PM +0300, Leon Romanovsky wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
> 
> The PCI core uses the VF index internally, often called the vf_id,
> during the setup of the VF, eg pci_iov_add_virtfn().
> 
> This index is needed for device drivers that implement live migration
> for their internal operations that configure/control their VFs.
>
> Specifically, mlx5_vfio_pci driver that is introduced in coming patches
> from this series needs it and not the bus/device/function which is
> exposed today.
> 
> Add pci_iov_vf_id() which computes the vf_id by reversing the math that
> was used to create the bus/device/function.
> 
> Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

mlx5_core_sriov_set_msix_vec_count() looks like it does basically the
same thing as pci_iov_vf_id() by iterating through VFs until it finds
one with a matching devfn (although it *doesn't* check for a matching
bus number, which seems like a bug).

Maybe that should use pci_iov_vf_id()?

> ---
>  drivers/pci/iov.c   | 14 ++++++++++++++
>  include/linux/pci.h |  7 ++++++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index dafdc652fcd0..e7751fa3fe0b 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -33,6 +33,20 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
>  }
>  EXPORT_SYMBOL_GPL(pci_iov_virtfn_devfn);
>  
> +int pci_iov_vf_id(struct pci_dev *dev)
> +{
> +	struct pci_dev *pf;
> +
> +	if (!dev->is_virtfn)
> +		return -EINVAL;
> +
> +	pf = pci_physfn(dev);
> +	return (((dev->bus->number << 8) + dev->devfn) -
> +		((pf->bus->number << 8) + pf->devfn + pf->sriov->offset)) /
> +	       pf->sriov->stride;
> +}
> +EXPORT_SYMBOL_GPL(pci_iov_vf_id);
> +
>  /*
>   * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may
>   * change when NumVFs changes.
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index cd8aa6fce204..4d6c73506e18 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -2153,7 +2153,7 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
>  #ifdef CONFIG_PCI_IOV
>  int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
>  int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
> -
> +int pci_iov_vf_id(struct pci_dev *dev);
>  int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
>  void pci_disable_sriov(struct pci_dev *dev);
>  
> @@ -2181,6 +2181,11 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
>  {
>  	return -ENOSYS;
>  }
> +static inline int pci_iov_vf_id(struct pci_dev *dev)
> +{
> +	return -ENOSYS;
> +}
> +

Drop the blank line to match the surrounding stubs.

>  static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
>  { return -ENODEV; }
Leon Romanovsky Sept. 23, 2021, 6:35 a.m. UTC | #2
On Wed, Sep 22, 2021 at 04:59:30PM -0500, Bjorn Helgaas wrote:
> On Wed, Sep 22, 2021 at 01:38:50PM +0300, Leon Romanovsky wrote:
> > From: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > The PCI core uses the VF index internally, often called the vf_id,
> > during the setup of the VF, eg pci_iov_add_virtfn().
> > 
> > This index is needed for device drivers that implement live migration
> > for their internal operations that configure/control their VFs.
> >
> > Specifically, mlx5_vfio_pci driver that is introduced in coming patches
> > from this series needs it and not the bus/device/function which is
> > exposed today.
> > 
> > Add pci_iov_vf_id() which computes the vf_id by reversing the math that
> > was used to create the bus/device/function.
> > 
> > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> mlx5_core_sriov_set_msix_vec_count() looks like it does basically the
> same thing as pci_iov_vf_id() by iterating through VFs until it finds
> one with a matching devfn (although it *doesn't* check for a matching
> bus number, which seems like a bug).
> 
> Maybe that should use pci_iov_vf_id()?

Yes, I gave same comment internally and we decided to simply reduce the
amount of changes in mlx5_core to have less distractions and submit as a
followup. Most likely will add this hunk in v1.

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
index e8185b69ac6c..b66be0b4244a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sriov.c
@@ -209,15 +209,8 @@ int mlx5_core_sriov_set_msix_vec_count(struct pci_dev *vf, int msix_vec_count)
        /* Reversed translation of PCI VF function number to the internal
         * function_id, which exists in the name of virtfn symlink.
         */
-       for (id = 0; id < pci_num_vf(pf); id++) {
-               if (!sriov->vfs_ctx[id].enabled)
-                       continue;
-
-               if (vf->devfn == pci_iov_virtfn_devfn(pf, id))
-                       break;
-       }
-
-       if (id == pci_num_vf(pf) || !sriov->vfs_ctx[id].enabled)
+       id = pci_iov_vf_id(vf);
+       if (id < 0 || !sriov->vfs_ctx[id].enabled)
                return -EINVAL;

        return mlx5_set_msix_vec_count(dev, id + 1, msix_vec_count);

Thanks

> 
> > ---
> >  drivers/pci/iov.c   | 14 ++++++++++++++
> >  include/linux/pci.h |  7 ++++++-
> >  2 files changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> > index dafdc652fcd0..e7751fa3fe0b 100644
> > --- a/drivers/pci/iov.c
> > +++ b/drivers/pci/iov.c
> > @@ -33,6 +33,20 @@ int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
> >  }
> >  EXPORT_SYMBOL_GPL(pci_iov_virtfn_devfn);
> >  
> > +int pci_iov_vf_id(struct pci_dev *dev)
> > +{
> > +	struct pci_dev *pf;
> > +
> > +	if (!dev->is_virtfn)
> > +		return -EINVAL;
> > +
> > +	pf = pci_physfn(dev);
> > +	return (((dev->bus->number << 8) + dev->devfn) -
> > +		((pf->bus->number << 8) + pf->devfn + pf->sriov->offset)) /
> > +	       pf->sriov->stride;
> > +}
> > +EXPORT_SYMBOL_GPL(pci_iov_vf_id);
> > +
> >  /*
> >   * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may
> >   * change when NumVFs changes.
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index cd8aa6fce204..4d6c73506e18 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -2153,7 +2153,7 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
> >  #ifdef CONFIG_PCI_IOV
> >  int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
> >  int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
> > -
> > +int pci_iov_vf_id(struct pci_dev *dev);
> >  int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
> >  void pci_disable_sriov(struct pci_dev *dev);
> >  
> > @@ -2181,6 +2181,11 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
> >  {
> >  	return -ENOSYS;
> >  }
> > +static inline int pci_iov_vf_id(struct pci_dev *dev)
> > +{
> > +	return -ENOSYS;
> > +}
> > +
> 
> Drop the blank line to match the surrounding stubs.

Sure, thanks

> 
> >  static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
> >  { return -ENODEV; }
>
Bjorn Helgaas Sept. 24, 2021, 1:08 p.m. UTC | #3
On Thu, Sep 23, 2021 at 09:35:32AM +0300, Leon Romanovsky wrote:
> On Wed, Sep 22, 2021 at 04:59:30PM -0500, Bjorn Helgaas wrote:
> > On Wed, Sep 22, 2021 at 01:38:50PM +0300, Leon Romanovsky wrote:
> > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > 
> > > The PCI core uses the VF index internally, often called the vf_id,
> > > during the setup of the VF, eg pci_iov_add_virtfn().
> > > 
> > > This index is needed for device drivers that implement live migration
> > > for their internal operations that configure/control their VFs.
> > >
> > > Specifically, mlx5_vfio_pci driver that is introduced in coming patches
> > > from this series needs it and not the bus/device/function which is
> > > exposed today.
> > > 
> > > Add pci_iov_vf_id() which computes the vf_id by reversing the math that
> > > was used to create the bus/device/function.
> > > 
> > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > 
> > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > 
> > mlx5_core_sriov_set_msix_vec_count() looks like it does basically the
> > same thing as pci_iov_vf_id() by iterating through VFs until it finds
> > one with a matching devfn (although it *doesn't* check for a matching
> > bus number, which seems like a bug).
> > 
> > Maybe that should use pci_iov_vf_id()?
> 
> Yes, I gave same comment internally and we decided to simply reduce the
> amount of changes in mlx5_core to have less distractions and submit as a
> followup. Most likely will add this hunk in v1.

I guess it backfired as far as reducing distractions, because now it
just looks like a job half-done.

And it still looks like the existing code is buggy.  This is called
via sysfs, so if the PF is on bus X and the user writes to
sriov_vf_msix_count for a VF on bus X+1, it looks like
mlx5_core_sriov_set_msix_vec_count() will set the count for the wrong
VF.

Bjorn
Leon Romanovsky Sept. 25, 2021, 10:10 a.m. UTC | #4
On Fri, Sep 24, 2021 at 08:08:45AM -0500, Bjorn Helgaas wrote:
> On Thu, Sep 23, 2021 at 09:35:32AM +0300, Leon Romanovsky wrote:
> > On Wed, Sep 22, 2021 at 04:59:30PM -0500, Bjorn Helgaas wrote:
> > > On Wed, Sep 22, 2021 at 01:38:50PM +0300, Leon Romanovsky wrote:
> > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > 
> > > > The PCI core uses the VF index internally, often called the vf_id,
> > > > during the setup of the VF, eg pci_iov_add_virtfn().
> > > > 
> > > > This index is needed for device drivers that implement live migration
> > > > for their internal operations that configure/control their VFs.
> > > >
> > > > Specifically, mlx5_vfio_pci driver that is introduced in coming patches
> > > > from this series needs it and not the bus/device/function which is
> > > > exposed today.
> > > > 
> > > > Add pci_iov_vf_id() which computes the vf_id by reversing the math that
> > > > was used to create the bus/device/function.
> > > > 
> > > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > 
> > > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > > 
> > > mlx5_core_sriov_set_msix_vec_count() looks like it does basically the
> > > same thing as pci_iov_vf_id() by iterating through VFs until it finds
> > > one with a matching devfn (although it *doesn't* check for a matching
> > > bus number, which seems like a bug).
> > > 
> > > Maybe that should use pci_iov_vf_id()?
> > 
> > Yes, I gave same comment internally and we decided to simply reduce the
> > amount of changes in mlx5_core to have less distractions and submit as a
> > followup. Most likely will add this hunk in v1.
> 
> I guess it backfired as far as reducing distractions, because now it
> just looks like a job half-done.

Partially :)
I didn't expect to see acceptance of this series in v0, we wanted to
gather feedback as early as possible.

> 
> And it still looks like the existing code is buggy.  This is called
> via sysfs, so if the PF is on bus X and the user writes to
> sriov_vf_msix_count for a VF on bus X+1, it looks like
> mlx5_core_sriov_set_msix_vec_count() will set the count for the wrong
> VF.

In mlx5_core_sriov_set_msix_vec_count(), we receive VF that is connected
to PF which has "struct mlx5_core_dev". My expectation is that they share
same bus as that PF was the one who created VFs. The mlx5 devices supports
upto 256 VFs and it is far below the bus split mentioned in PCI spec.

How can VF and their respective PF have different bus numbers?

Thanks

> 
> Bjorn
Bjorn Helgaas Sept. 25, 2021, 5:41 p.m. UTC | #5
On Sat, Sep 25, 2021 at 01:10:39PM +0300, Leon Romanovsky wrote:
> On Fri, Sep 24, 2021 at 08:08:45AM -0500, Bjorn Helgaas wrote:
> > On Thu, Sep 23, 2021 at 09:35:32AM +0300, Leon Romanovsky wrote:
> > > On Wed, Sep 22, 2021 at 04:59:30PM -0500, Bjorn Helgaas wrote:
> > > > On Wed, Sep 22, 2021 at 01:38:50PM +0300, Leon Romanovsky wrote:
> > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > 
> > > > > The PCI core uses the VF index internally, often called the vf_id,
> > > > > during the setup of the VF, eg pci_iov_add_virtfn().
> > > > > 
> > > > > This index is needed for device drivers that implement live migration
> > > > > for their internal operations that configure/control their VFs.
> > > > >
> > > > > Specifically, mlx5_vfio_pci driver that is introduced in coming patches
> > > > > from this series needs it and not the bus/device/function which is
> > > > > exposed today.
> > > > > 
> > > > > Add pci_iov_vf_id() which computes the vf_id by reversing the math that
> > > > > was used to create the bus/device/function.
> > > > > 
> > > > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > 
> > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > > > 
> > > > mlx5_core_sriov_set_msix_vec_count() looks like it does basically the
> > > > same thing as pci_iov_vf_id() by iterating through VFs until it finds
> > > > one with a matching devfn (although it *doesn't* check for a matching
> > > > bus number, which seems like a bug).
> ...

> > And it still looks like the existing code is buggy.  This is called
> > via sysfs, so if the PF is on bus X and the user writes to
> > sriov_vf_msix_count for a VF on bus X+1, it looks like
> > mlx5_core_sriov_set_msix_vec_count() will set the count for the wrong
> > VF.
> 
> In mlx5_core_sriov_set_msix_vec_count(), we receive VF that is connected
> to PF which has "struct mlx5_core_dev". My expectation is that they share
> same bus as that PF was the one who created VFs. The mlx5 devices supports
> upto 256 VFs and it is far below the bus split mentioned in PCI spec.
> 
> How can VF and their respective PF have different bus numbers?

See PCIe r5.0, sec 9.2.1.2.  For example,

  PF 0 on bus 20
    First VF Offset   1
    VF Stride         1
    NumVFs          511
  VF 0,1   through VF 0,255 on bus 20
  VF 0,256 through VF 0,511 on bus 21

This is implemented in pci_iov_add_virtfn(), which computes the bus
number and devfn from the VF ID.

pci_iov_virtfn_devfn(VF 0,1) == pci_iov_virtfn_devfn(VF 0,256), so if
the user writes to sriov_vf_msix_count for VF 0,256, it looks like
we'll call mlx5_set_msix_vec_count() for VF 0,1 instead of VF 0,256.

The spec encourages devices that require no more than 256 devices to
locate them all on the same bus number (PCIe r5.0, sec 9.1), so if you
only have 255 VFs, you may avoid the problem.

But in mlx5_core_sriov_set_msix_vec_count(), it's not obvious that it
is safe to assume the bus number is the same.

Bjorn
Leon Romanovsky Sept. 26, 2021, 6:36 a.m. UTC | #6
On Sat, Sep 25, 2021 at 12:41:15PM -0500, Bjorn Helgaas wrote:
> On Sat, Sep 25, 2021 at 01:10:39PM +0300, Leon Romanovsky wrote:
> > On Fri, Sep 24, 2021 at 08:08:45AM -0500, Bjorn Helgaas wrote:
> > > On Thu, Sep 23, 2021 at 09:35:32AM +0300, Leon Romanovsky wrote:
> > > > On Wed, Sep 22, 2021 at 04:59:30PM -0500, Bjorn Helgaas wrote:
> > > > > On Wed, Sep 22, 2021 at 01:38:50PM +0300, Leon Romanovsky wrote:
> > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > 
> > > > > > The PCI core uses the VF index internally, often called the vf_id,
> > > > > > during the setup of the VF, eg pci_iov_add_virtfn().
> > > > > > 
> > > > > > This index is needed for device drivers that implement live migration
> > > > > > for their internal operations that configure/control their VFs.
> > > > > >
> > > > > > Specifically, mlx5_vfio_pci driver that is introduced in coming patches
> > > > > > from this series needs it and not the bus/device/function which is
> > > > > > exposed today.
> > > > > > 
> > > > > > Add pci_iov_vf_id() which computes the vf_id by reversing the math that
> > > > > > was used to create the bus/device/function.
> > > > > > 
> > > > > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > > 
> > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > > > > 
> > > > > mlx5_core_sriov_set_msix_vec_count() looks like it does basically the
> > > > > same thing as pci_iov_vf_id() by iterating through VFs until it finds
> > > > > one with a matching devfn (although it *doesn't* check for a matching
> > > > > bus number, which seems like a bug).
> > ...
> 
> > > And it still looks like the existing code is buggy.  This is called
> > > via sysfs, so if the PF is on bus X and the user writes to
> > > sriov_vf_msix_count for a VF on bus X+1, it looks like
> > > mlx5_core_sriov_set_msix_vec_count() will set the count for the wrong
> > > VF.
> > 
> > In mlx5_core_sriov_set_msix_vec_count(), we receive VF that is connected
> > to PF which has "struct mlx5_core_dev". My expectation is that they share
> > same bus as that PF was the one who created VFs. The mlx5 devices supports
> > upto 256 VFs and it is far below the bus split mentioned in PCI spec.
> > 
> > How can VF and their respective PF have different bus numbers?
> 
> See PCIe r5.0, sec 9.2.1.2.  For example,
> 
>   PF 0 on bus 20
>     First VF Offset   1
>     VF Stride         1
>     NumVFs          511
>   VF 0,1   through VF 0,255 on bus 20
>   VF 0,256 through VF 0,511 on bus 21
> 
> This is implemented in pci_iov_add_virtfn(), which computes the bus
> number and devfn from the VF ID.
> 
> pci_iov_virtfn_devfn(VF 0,1) == pci_iov_virtfn_devfn(VF 0,256), so if
> the user writes to sriov_vf_msix_count for VF 0,256, it looks like
> we'll call mlx5_set_msix_vec_count() for VF 0,1 instead of VF 0,256.

This is PCI spec split that I mentioned.

> 
> The spec encourages devices that require no more than 256 devices to
> locate them all on the same bus number (PCIe r5.0, sec 9.1), so if you
> only have 255 VFs, you may avoid the problem.
> 
> But in mlx5_core_sriov_set_msix_vec_count(), it's not obvious that it
> is safe to assume the bus number is the same.

No problem, we will make it more clear.

> 
> Bjorn
Bjorn Helgaas Sept. 26, 2021, 8:23 p.m. UTC | #7
On Sun, Sep 26, 2021 at 09:36:49AM +0300, Leon Romanovsky wrote:
> On Sat, Sep 25, 2021 at 12:41:15PM -0500, Bjorn Helgaas wrote:
> > On Sat, Sep 25, 2021 at 01:10:39PM +0300, Leon Romanovsky wrote:
> > > On Fri, Sep 24, 2021 at 08:08:45AM -0500, Bjorn Helgaas wrote:
> > > > On Thu, Sep 23, 2021 at 09:35:32AM +0300, Leon Romanovsky wrote:
> > > > > On Wed, Sep 22, 2021 at 04:59:30PM -0500, Bjorn Helgaas wrote:
> > > > > > On Wed, Sep 22, 2021 at 01:38:50PM +0300, Leon Romanovsky wrote:
> > > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > > 
> > > > > > > The PCI core uses the VF index internally, often called the vf_id,
> > > > > > > during the setup of the VF, eg pci_iov_add_virtfn().
> > > > > > > 
> > > > > > > This index is needed for device drivers that implement live migration
> > > > > > > for their internal operations that configure/control their VFs.
> > > > > > >
> > > > > > > Specifically, mlx5_vfio_pci driver that is introduced in coming patches
> > > > > > > from this series needs it and not the bus/device/function which is
> > > > > > > exposed today.
> > > > > > > 
> > > > > > > Add pci_iov_vf_id() which computes the vf_id by reversing the math that
> > > > > > > was used to create the bus/device/function.
> > > > > > > 
> > > > > > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > > > 
> > > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > > > > > 
> > > > > > mlx5_core_sriov_set_msix_vec_count() looks like it does basically the
> > > > > > same thing as pci_iov_vf_id() by iterating through VFs until it finds
> > > > > > one with a matching devfn (although it *doesn't* check for a matching
> > > > > > bus number, which seems like a bug).
> > > ...
> > 
> > > > And it still looks like the existing code is buggy.  This is called
> > > > via sysfs, so if the PF is on bus X and the user writes to
> > > > sriov_vf_msix_count for a VF on bus X+1, it looks like
> > > > mlx5_core_sriov_set_msix_vec_count() will set the count for the wrong
> > > > VF.
> > > 
> > > In mlx5_core_sriov_set_msix_vec_count(), we receive VF that is connected
> > > to PF which has "struct mlx5_core_dev". My expectation is that they share
> > > same bus as that PF was the one who created VFs. The mlx5 devices supports
> > > upto 256 VFs and it is far below the bus split mentioned in PCI spec.
> > > 
> > > How can VF and their respective PF have different bus numbers?
> > 
> > See PCIe r5.0, sec 9.2.1.2.  For example,
> > 
> >   PF 0 on bus 20
> >     First VF Offset   1
> >     VF Stride         1
> >     NumVFs          511
> >   VF 0,1   through VF 0,255 on bus 20
> >   VF 0,256 through VF 0,511 on bus 21
> > 
> > This is implemented in pci_iov_add_virtfn(), which computes the bus
> > number and devfn from the VF ID.
> > 
> > pci_iov_virtfn_devfn(VF 0,1) == pci_iov_virtfn_devfn(VF 0,256), so if
> > the user writes to sriov_vf_msix_count for VF 0,256, it looks like
> > we'll call mlx5_set_msix_vec_count() for VF 0,1 instead of VF 0,256.
> 
> This is PCI spec split that I mentioned.
> 
> > 
> > The spec encourages devices that require no more than 256 devices to
> > locate them all on the same bus number (PCIe r5.0, sec 9.1), so if you
> > only have 255 VFs, you may avoid the problem.
> > 
> > But in mlx5_core_sriov_set_msix_vec_count(), it's not obvious that it
> > is safe to assume the bus number is the same.
> 
> No problem, we will make it more clear.

IMHO you should resolve it by using the new interface.  Better
performing, unambiguous regardless of how many VFs the device
supports.  What's the down side?
Leon Romanovsky Sept. 27, 2021, 11:55 a.m. UTC | #8
On Sun, Sep 26, 2021 at 03:23:41PM -0500, Bjorn Helgaas wrote:
> On Sun, Sep 26, 2021 at 09:36:49AM +0300, Leon Romanovsky wrote:
> > On Sat, Sep 25, 2021 at 12:41:15PM -0500, Bjorn Helgaas wrote:
> > > On Sat, Sep 25, 2021 at 01:10:39PM +0300, Leon Romanovsky wrote:
> > > > On Fri, Sep 24, 2021 at 08:08:45AM -0500, Bjorn Helgaas wrote:
> > > > > On Thu, Sep 23, 2021 at 09:35:32AM +0300, Leon Romanovsky wrote:
> > > > > > On Wed, Sep 22, 2021 at 04:59:30PM -0500, Bjorn Helgaas wrote:
> > > > > > > On Wed, Sep 22, 2021 at 01:38:50PM +0300, Leon Romanovsky wrote:
> > > > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > > > 
> > > > > > > > The PCI core uses the VF index internally, often called the vf_id,
> > > > > > > > during the setup of the VF, eg pci_iov_add_virtfn().
> > > > > > > > 
> > > > > > > > This index is needed for device drivers that implement live migration
> > > > > > > > for their internal operations that configure/control their VFs.
> > > > > > > >
> > > > > > > > Specifically, mlx5_vfio_pci driver that is introduced in coming patches
> > > > > > > > from this series needs it and not the bus/device/function which is
> > > > > > > > exposed today.
> > > > > > > > 
> > > > > > > > Add pci_iov_vf_id() which computes the vf_id by reversing the math that
> > > > > > > > was used to create the bus/device/function.
> > > > > > > > 
> > > > > > > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > > > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > > > > 
> > > > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > > > > > > 
> > > > > > > mlx5_core_sriov_set_msix_vec_count() looks like it does basically the
> > > > > > > same thing as pci_iov_vf_id() by iterating through VFs until it finds
> > > > > > > one with a matching devfn (although it *doesn't* check for a matching
> > > > > > > bus number, which seems like a bug).
> > > > ...
> > > 
> > > > > And it still looks like the existing code is buggy.  This is called
> > > > > via sysfs, so if the PF is on bus X and the user writes to
> > > > > sriov_vf_msix_count for a VF on bus X+1, it looks like
> > > > > mlx5_core_sriov_set_msix_vec_count() will set the count for the wrong
> > > > > VF.
> > > > 
> > > > In mlx5_core_sriov_set_msix_vec_count(), we receive VF that is connected
> > > > to PF which has "struct mlx5_core_dev". My expectation is that they share
> > > > same bus as that PF was the one who created VFs. The mlx5 devices supports
> > > > upto 256 VFs and it is far below the bus split mentioned in PCI spec.
> > > > 
> > > > How can VF and their respective PF have different bus numbers?
> > > 
> > > See PCIe r5.0, sec 9.2.1.2.  For example,
> > > 
> > >   PF 0 on bus 20
> > >     First VF Offset   1
> > >     VF Stride         1
> > >     NumVFs          511
> > >   VF 0,1   through VF 0,255 on bus 20
> > >   VF 0,256 through VF 0,511 on bus 21
> > > 
> > > This is implemented in pci_iov_add_virtfn(), which computes the bus
> > > number and devfn from the VF ID.
> > > 
> > > pci_iov_virtfn_devfn(VF 0,1) == pci_iov_virtfn_devfn(VF 0,256), so if
> > > the user writes to sriov_vf_msix_count for VF 0,256, it looks like
> > > we'll call mlx5_set_msix_vec_count() for VF 0,1 instead of VF 0,256.
> > 
> > This is PCI spec split that I mentioned.
> > 
> > > 
> > > The spec encourages devices that require no more than 256 devices to
> > > locate them all on the same bus number (PCIe r5.0, sec 9.1), so if you
> > > only have 255 VFs, you may avoid the problem.
> > > 
> > > But in mlx5_core_sriov_set_msix_vec_count(), it's not obvious that it
> > > is safe to assume the bus number is the same.
> > 
> > No problem, we will make it more clear.
> 
> IMHO you should resolve it by using the new interface.  Better
> performing, unambiguous regardless of how many VFs the device
> supports.  What's the down side?

I don't see any. My previous answer worth to be written.
"No problem, we will make it more clear with this new function".

Thanks
Bjorn Helgaas Sept. 27, 2021, 2:47 p.m. UTC | #9
On Mon, Sep 27, 2021 at 02:55:24PM +0300, Leon Romanovsky wrote:
> On Sun, Sep 26, 2021 at 03:23:41PM -0500, Bjorn Helgaas wrote:
> > On Sun, Sep 26, 2021 at 09:36:49AM +0300, Leon Romanovsky wrote:
> > > On Sat, Sep 25, 2021 at 12:41:15PM -0500, Bjorn Helgaas wrote:
> > > > On Sat, Sep 25, 2021 at 01:10:39PM +0300, Leon Romanovsky wrote:
> > > > > On Fri, Sep 24, 2021 at 08:08:45AM -0500, Bjorn Helgaas wrote:
> > > > > > On Thu, Sep 23, 2021 at 09:35:32AM +0300, Leon Romanovsky wrote:
> > > > > > > On Wed, Sep 22, 2021 at 04:59:30PM -0500, Bjorn Helgaas wrote:
> > > > > > > > On Wed, Sep 22, 2021 at 01:38:50PM +0300, Leon Romanovsky wrote:
> > > > > > > > > From: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > > > > 
> > > > > > > > > The PCI core uses the VF index internally, often called the vf_id,
> > > > > > > > > during the setup of the VF, eg pci_iov_add_virtfn().
> > > > > > > > > 
> > > > > > > > > This index is needed for device drivers that implement live migration
> > > > > > > > > for their internal operations that configure/control their VFs.
> > > > > > > > >
> > > > > > > > > Specifically, mlx5_vfio_pci driver that is introduced in coming patches
> > > > > > > > > from this series needs it and not the bus/device/function which is
> > > > > > > > > exposed today.
> > > > > > > > > 
> > > > > > > > > Add pci_iov_vf_id() which computes the vf_id by reversing the math that
> > > > > > > > > was used to create the bus/device/function.
> > > > > > > > > 
> > > > > > > > > Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
> > > > > > > > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > > > > > > > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > > > > > > > 
> > > > > > > > Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> > > > > > > > 
> > > > > > > > mlx5_core_sriov_set_msix_vec_count() looks like it does basically the
> > > > > > > > same thing as pci_iov_vf_id() by iterating through VFs until it finds
> > > > > > > > one with a matching devfn (although it *doesn't* check for a matching
> > > > > > > > bus number, which seems like a bug).
> > > > > ...
> > > > 
> > > > > > And it still looks like the existing code is buggy.  This is called
> > > > > > via sysfs, so if the PF is on bus X and the user writes to
> > > > > > sriov_vf_msix_count for a VF on bus X+1, it looks like
> > > > > > mlx5_core_sriov_set_msix_vec_count() will set the count for the wrong
> > > > > > VF.
> > > > > 
> > > > > In mlx5_core_sriov_set_msix_vec_count(), we receive VF that is connected
> > > > > to PF which has "struct mlx5_core_dev". My expectation is that they share
> > > > > same bus as that PF was the one who created VFs. The mlx5 devices supports
> > > > > upto 256 VFs and it is far below the bus split mentioned in PCI spec.
> > > > > 
> > > > > How can VF and their respective PF have different bus numbers?
> > > > 
> > > > See PCIe r5.0, sec 9.2.1.2.  For example,
> > > > 
> > > >   PF 0 on bus 20
> > > >     First VF Offset   1
> > > >     VF Stride         1
> > > >     NumVFs          511
> > > >   VF 0,1   through VF 0,255 on bus 20
> > > >   VF 0,256 through VF 0,511 on bus 21
> > > > 
> > > > This is implemented in pci_iov_add_virtfn(), which computes the bus
> > > > number and devfn from the VF ID.
> > > > 
> > > > pci_iov_virtfn_devfn(VF 0,1) == pci_iov_virtfn_devfn(VF 0,256), so if
> > > > the user writes to sriov_vf_msix_count for VF 0,256, it looks like
> > > > we'll call mlx5_set_msix_vec_count() for VF 0,1 instead of VF 0,256.
> > > 
> > > This is PCI spec split that I mentioned.
> > > 
> > > > 
> > > > The spec encourages devices that require no more than 256 devices to
> > > > locate them all on the same bus number (PCIe r5.0, sec 9.1), so if you
> > > > only have 255 VFs, you may avoid the problem.
> > > > 
> > > > But in mlx5_core_sriov_set_msix_vec_count(), it's not obvious that it
> > > > is safe to assume the bus number is the same.
> > > 
> > > No problem, we will make it more clear.
> > 
> > IMHO you should resolve it by using the new interface.  Better
> > performing, unambiguous regardless of how many VFs the device
> > supports.  What's the down side?
> 
> I don't see any. My previous answer worth to be written.
> "No problem, we will make it more clear with this new function".

Great, sorry I missed that nuance :)
diff mbox series

Patch

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index dafdc652fcd0..e7751fa3fe0b 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -33,6 +33,20 @@  int pci_iov_virtfn_devfn(struct pci_dev *dev, int vf_id)
 }
 EXPORT_SYMBOL_GPL(pci_iov_virtfn_devfn);
 
+int pci_iov_vf_id(struct pci_dev *dev)
+{
+	struct pci_dev *pf;
+
+	if (!dev->is_virtfn)
+		return -EINVAL;
+
+	pf = pci_physfn(dev);
+	return (((dev->bus->number << 8) + dev->devfn) -
+		((pf->bus->number << 8) + pf->devfn + pf->sriov->offset)) /
+	       pf->sriov->stride;
+}
+EXPORT_SYMBOL_GPL(pci_iov_vf_id);
+
 /*
  * Per SR-IOV spec sec 3.3.10 and 3.3.11, First VF Offset and VF Stride may
  * change when NumVFs changes.
diff --git a/include/linux/pci.h b/include/linux/pci.h
index cd8aa6fce204..4d6c73506e18 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -2153,7 +2153,7 @@  void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar);
 #ifdef CONFIG_PCI_IOV
 int pci_iov_virtfn_bus(struct pci_dev *dev, int id);
 int pci_iov_virtfn_devfn(struct pci_dev *dev, int id);
-
+int pci_iov_vf_id(struct pci_dev *dev);
 int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn);
 void pci_disable_sriov(struct pci_dev *dev);
 
@@ -2181,6 +2181,11 @@  static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id)
 {
 	return -ENOSYS;
 }
+static inline int pci_iov_vf_id(struct pci_dev *dev)
+{
+	return -ENOSYS;
+}
+
 static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn)
 { return -ENODEV; }