diff mbox series

[v2] PCI: vmd: Clear PCI_INTERRUPT_LINE for VMD sub-devices

Message ID 20240820223213.210929-1-nirmal.patel@linux.intel.com (mailing list archive)
State Rejected
Delegated to: Manivannan Sadhasivam
Headers show
Series [v2] PCI: vmd: Clear PCI_INTERRUPT_LINE for VMD sub-devices | expand

Commit Message

Nirmal Patel Aug. 20, 2024, 10:32 p.m. UTC
VMD does not support INTx for devices downstream from a VMD endpoint.
So initialize the PCI_INTERRUPT_LINE to 0 for all NVMe devices under
VMD to ensure other applications don't try to set up an INTx for them.

Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
---
v2->v1: Change the execution from fixup.c to vmd.c
---
 drivers/pci/controller/vmd.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

Comments

Manivannan Sadhasivam Aug. 22, 2024, 9:48 a.m. UTC | #1
On Tue, Aug 20, 2024 at 03:32:13PM -0700, Nirmal Patel wrote:
> VMD does not support INTx for devices downstream from a VMD endpoint.
> So initialize the PCI_INTERRUPT_LINE to 0 for all NVMe devices under
> VMD to ensure other applications don't try to set up an INTx for them.
> 
> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>

I shared a diff to put it in pci_assign_irq() and you said that you were going
to test it [1]. I don't see a reply to that and now you came up with another
approach.

What happened inbetween?

- Mani

[1] https://lore.kernel.org/linux-pci/20240801115756.0000272e@linux.intel.com

> ---
> v2->v1: Change the execution from fixup.c to vmd.c
> ---
>  drivers/pci/controller/vmd.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index a726de0af011..2e9b99969b81 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -778,6 +778,18 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata)
>  	return 0;
>  }
>  
> +/*
> + * Some applications like SPDK reads PCI_INTERRUPT_LINE to decide
> + * whether INTx is enabled or not. Since VMD doesn't support INTx,
> + * write 0 to all NVMe devices under VMD.
> + */
> +static int vmd_clr_int_line_reg(struct pci_dev *dev, void *userdata)
> +{
> +	if(dev->class == PCI_CLASS_STORAGE_EXPRESS)
> +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE, 0);
> +	return 0;
> +}
> +
>  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  {
>  	struct pci_sysdata *sd = &vmd->sysdata;
> @@ -932,6 +944,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
>  
>  	pci_scan_child_bus(vmd->bus);
>  	vmd_domain_reset(vmd);
> +	pci_walk_bus(vmd->bus, vmd_clr_int_line_reg, &features);
>  
>  	/* When Intel VMD is enabled, the OS does not discover the Root Ports
>  	 * owned by Intel VMD within the MMCFG space. pci_reset_bus() applies
> -- 
> 2.39.1
> 
>
Nirmal Patel Aug. 22, 2024, 6:30 p.m. UTC | #2
On Thu, 22 Aug 2024 15:18:06 +0530
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:

> On Tue, Aug 20, 2024 at 03:32:13PM -0700, Nirmal Patel wrote:
> > VMD does not support INTx for devices downstream from a VMD
> > endpoint. So initialize the PCI_INTERRUPT_LINE to 0 for all NVMe
> > devices under VMD to ensure other applications don't try to set up
> > an INTx for them.
> > 
> > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>  
> 
> I shared a diff to put it in pci_assign_irq() and you said that you
> were going to test it [1]. I don't see a reply to that and now you
> came up with another approach.
> 
> What happened inbetween?

Apologies, I did perform the tests and the patch worked fine. However, I
was able to see lot of bridge devices had the register set to 0xFF and I
didn't want to alter them. Also pci_assign_irg would still set the
interrupt line register to 0 with or without VMD. Since I didn't want to
introduce issues for non-VMD setup, I decide to keep the change limited
only to the VMD.

-Nirmal
> 
> - Mani
> 
> [1]
> https://lore.kernel.org/linux-pci/20240801115756.0000272e@linux.intel.com
> 
> > ---
> > v2->v1: Change the execution from fixup.c to vmd.c
> > ---
> >  drivers/pci/controller/vmd.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/drivers/pci/controller/vmd.c
> > b/drivers/pci/controller/vmd.c index a726de0af011..2e9b99969b81
> > 100644 --- a/drivers/pci/controller/vmd.c
> > +++ b/drivers/pci/controller/vmd.c
> > @@ -778,6 +778,18 @@ static int vmd_pm_enable_quirk(struct pci_dev
> > *pdev, void *userdata) return 0;
> >  }
> >  
> > +/*
> > + * Some applications like SPDK reads PCI_INTERRUPT_LINE to decide
> > + * whether INTx is enabled or not. Since VMD doesn't support INTx,
> > + * write 0 to all NVMe devices under VMD.
> > + */
> > +static int vmd_clr_int_line_reg(struct pci_dev *dev, void
> > *userdata) +{
> > +	if(dev->class == PCI_CLASS_STORAGE_EXPRESS)
> > +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE, 0);
> > +	return 0;
> > +}
> > +
> >  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long
> > features) {
> >  	struct pci_sysdata *sd = &vmd->sysdata;
> > @@ -932,6 +944,7 @@ static int vmd_enable_domain(struct vmd_dev
> > *vmd, unsigned long features) 
> >  	pci_scan_child_bus(vmd->bus);
> >  	vmd_domain_reset(vmd);
> > +	pci_walk_bus(vmd->bus, vmd_clr_int_line_reg, &features);
> >  
> >  	/* When Intel VMD is enabled, the OS does not discover the
> > Root Ports
> >  	 * owned by Intel VMD within the MMCFG space.
> > pci_reset_bus() applies -- 
> > 2.39.1
> > 
> >   
>
Manivannan Sadhasivam Sept. 12, 2024, 2:36 p.m. UTC | #3
On Thu, Aug 22, 2024 at 11:30:10AM -0700, Nirmal Patel wrote:
> On Thu, 22 Aug 2024 15:18:06 +0530
> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> 
> > On Tue, Aug 20, 2024 at 03:32:13PM -0700, Nirmal Patel wrote:
> > > VMD does not support INTx for devices downstream from a VMD
> > > endpoint. So initialize the PCI_INTERRUPT_LINE to 0 for all NVMe
> > > devices under VMD to ensure other applications don't try to set up
> > > an INTx for them.
> > > 
> > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>  
> > 
> > I shared a diff to put it in pci_assign_irq() and you said that you
> > were going to test it [1]. I don't see a reply to that and now you
> > came up with another approach.
> > 
> > What happened inbetween?
> 
> Apologies, I did perform the tests and the patch worked fine. However, I
> was able to see lot of bridge devices had the register set to 0xFF and I
> didn't want to alter them.

You should've either replied to my comment or mentioned it in the changelog.

> Also pci_assign_irg would still set the
> interrupt line register to 0 with or without VMD. Since I didn't want to
> introduce issues for non-VMD setup, I decide to keep the change limited
> only to the VMD.
> 

Sorry no. SPDK usecase is not specific to VMD and so is the issue. So this
should be fixed in the PCI core as I proposed. What if another bridge also wants
to do the same?

- Mani 

> -Nirmal
> > 
> > - Mani
> > 
> > [1]
> > https://lore.kernel.org/linux-pci/20240801115756.0000272e@linux.intel.com
> > 
> > > ---
> > > v2->v1: Change the execution from fixup.c to vmd.c
> > > ---
> > >  drivers/pci/controller/vmd.c | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > > 
> > > diff --git a/drivers/pci/controller/vmd.c
> > > b/drivers/pci/controller/vmd.c index a726de0af011..2e9b99969b81
> > > 100644 --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -778,6 +778,18 @@ static int vmd_pm_enable_quirk(struct pci_dev
> > > *pdev, void *userdata) return 0;
> > >  }
> > >  
> > > +/*
> > > + * Some applications like SPDK reads PCI_INTERRUPT_LINE to decide
> > > + * whether INTx is enabled or not. Since VMD doesn't support INTx,
> > > + * write 0 to all NVMe devices under VMD.
> > > + */
> > > +static int vmd_clr_int_line_reg(struct pci_dev *dev, void
> > > *userdata) +{
> > > +	if(dev->class == PCI_CLASS_STORAGE_EXPRESS)
> > > +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE, 0);
> > > +	return 0;
> > > +}
> > > +
> > >  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long
> > > features) {
> > >  	struct pci_sysdata *sd = &vmd->sysdata;
> > > @@ -932,6 +944,7 @@ static int vmd_enable_domain(struct vmd_dev
> > > *vmd, unsigned long features) 
> > >  	pci_scan_child_bus(vmd->bus);
> > >  	vmd_domain_reset(vmd);
> > > +	pci_walk_bus(vmd->bus, vmd_clr_int_line_reg, &features);
> > >  
> > >  	/* When Intel VMD is enabled, the OS does not discover the
> > > Root Ports
> > >  	 * owned by Intel VMD within the MMCFG space.
> > > pci_reset_bus() applies -- 
> > > 2.39.1
> > > 
> > >   
> > 
>
Nirmal Patel Sept. 12, 2024, 3:31 p.m. UTC | #4
On Thu, 12 Sep 2024 20:06:57 +0530
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:

> On Thu, Aug 22, 2024 at 11:30:10AM -0700, Nirmal Patel wrote:
> > On Thu, 22 Aug 2024 15:18:06 +0530
> > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> >   
> > > On Tue, Aug 20, 2024 at 03:32:13PM -0700, Nirmal Patel wrote:  
> > > > VMD does not support INTx for devices downstream from a VMD
> > > > endpoint. So initialize the PCI_INTERRUPT_LINE to 0 for all NVMe
> > > > devices under VMD to ensure other applications don't try to set
> > > > up an INTx for them.
> > > > 
> > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>    
> > > 
> > > I shared a diff to put it in pci_assign_irq() and you said that
> > > you were going to test it [1]. I don't see a reply to that and
> > > now you came up with another approach.
> > > 
> > > What happened inbetween?  
> > 
> > Apologies, I did perform the tests and the patch worked fine.
> > However, I was able to see lot of bridge devices had the register
> > set to 0xFF and I didn't want to alter them.  
> 
> You should've either replied to my comment or mentioned it in the
> changelog.
> 
> > Also pci_assign_irg would still set the
> > interrupt line register to 0 with or without VMD. Since I didn't
> > want to introduce issues for non-VMD setup, I decide to keep the
> > change limited only to the VMD.
> >   
> 
> Sorry no. SPDK usecase is not specific to VMD and so is the issue. So
> this should be fixed in the PCI core as I proposed. What if another
> bridge also wants to do the same?

Okay. Should I clear every device that doesn't have map_irq setup like
you mentioned in your suggested patch or keep it to NVMe or devices
with storage class code?

-nirmal
> 
> - Mani 
> 
> > -Nirmal  
> > > 
> > > - Mani
> > > 
> > > [1]
> > > https://lore.kernel.org/linux-pci/20240801115756.0000272e@linux.intel.com
> > >   
> > > > ---
> > > > v2->v1: Change the execution from fixup.c to vmd.c
> > > > ---
> > > >  drivers/pci/controller/vmd.c | 13 +++++++++++++
> > > >  1 file changed, 13 insertions(+)
> > > > 
> > > > diff --git a/drivers/pci/controller/vmd.c
> > > > b/drivers/pci/controller/vmd.c index a726de0af011..2e9b99969b81
> > > > 100644 --- a/drivers/pci/controller/vmd.c
> > > > +++ b/drivers/pci/controller/vmd.c
> > > > @@ -778,6 +778,18 @@ static int vmd_pm_enable_quirk(struct
> > > > pci_dev *pdev, void *userdata) return 0;
> > > >  }
> > > >  
> > > > +/*
> > > > + * Some applications like SPDK reads PCI_INTERRUPT_LINE to
> > > > decide
> > > > + * whether INTx is enabled or not. Since VMD doesn't support
> > > > INTx,
> > > > + * write 0 to all NVMe devices under VMD.
> > > > + */
> > > > +static int vmd_clr_int_line_reg(struct pci_dev *dev, void
> > > > *userdata) +{
> > > > +	if(dev->class == PCI_CLASS_STORAGE_EXPRESS)
> > > > +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE,
> > > > 0);
> > > > +	return 0;
> > > > +}
> > > > +
> > > >  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long
> > > > features) {
> > > >  	struct pci_sysdata *sd = &vmd->sysdata;
> > > > @@ -932,6 +944,7 @@ static int vmd_enable_domain(struct vmd_dev
> > > > *vmd, unsigned long features) 
> > > >  	pci_scan_child_bus(vmd->bus);
> > > >  	vmd_domain_reset(vmd);
> > > > +	pci_walk_bus(vmd->bus, vmd_clr_int_line_reg,
> > > > &features); 
> > > >  	/* When Intel VMD is enabled, the OS does not discover
> > > > the Root Ports
> > > >  	 * owned by Intel VMD within the MMCFG space.
> > > > pci_reset_bus() applies -- 
> > > > 2.39.1
> > > > 
> > > >     
> > >   
> >   
>
Manivannan Sadhasivam Sept. 12, 2024, 4:47 p.m. UTC | #5
On Thu, Sep 12, 2024 at 08:31:00AM -0700, Nirmal Patel wrote:
> On Thu, 12 Sep 2024 20:06:57 +0530
> Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> 
> > On Thu, Aug 22, 2024 at 11:30:10AM -0700, Nirmal Patel wrote:
> > > On Thu, 22 Aug 2024 15:18:06 +0530
> > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> > >   
> > > > On Tue, Aug 20, 2024 at 03:32:13PM -0700, Nirmal Patel wrote:  
> > > > > VMD does not support INTx for devices downstream from a VMD
> > > > > endpoint. So initialize the PCI_INTERRUPT_LINE to 0 for all NVMe
> > > > > devices under VMD to ensure other applications don't try to set
> > > > > up an INTx for them.
> > > > > 
> > > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>    
> > > > 
> > > > I shared a diff to put it in pci_assign_irq() and you said that
> > > > you were going to test it [1]. I don't see a reply to that and
> > > > now you came up with another approach.
> > > > 
> > > > What happened inbetween?  
> > > 
> > > Apologies, I did perform the tests and the patch worked fine.
> > > However, I was able to see lot of bridge devices had the register
> > > set to 0xFF and I didn't want to alter them.  
> > 
> > You should've either replied to my comment or mentioned it in the
> > changelog.
> > 
> > > Also pci_assign_irg would still set the
> > > interrupt line register to 0 with or without VMD. Since I didn't
> > > want to introduce issues for non-VMD setup, I decide to keep the
> > > change limited only to the VMD.
> > >   
> > 
> > Sorry no. SPDK usecase is not specific to VMD and so is the issue. So
> > this should be fixed in the PCI core as I proposed. What if another
> > bridge also wants to do the same?
> 
> Okay. Should I clear every device that doesn't have map_irq setup like
> you mentioned in your suggested patch or keep it to NVMe or devices
> with storage class code?
> 

For all the devices.

- Mani

> -nirmal
> > 
> > - Mani 
> > 
> > > -Nirmal  
> > > > 
> > > > - Mani
> > > > 
> > > > [1]
> > > > https://lore.kernel.org/linux-pci/20240801115756.0000272e@linux.intel.com
> > > >   
> > > > > ---
> > > > > v2->v1: Change the execution from fixup.c to vmd.c
> > > > > ---
> > > > >  drivers/pci/controller/vmd.c | 13 +++++++++++++
> > > > >  1 file changed, 13 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/pci/controller/vmd.c
> > > > > b/drivers/pci/controller/vmd.c index a726de0af011..2e9b99969b81
> > > > > 100644 --- a/drivers/pci/controller/vmd.c
> > > > > +++ b/drivers/pci/controller/vmd.c
> > > > > @@ -778,6 +778,18 @@ static int vmd_pm_enable_quirk(struct
> > > > > pci_dev *pdev, void *userdata) return 0;
> > > > >  }
> > > > >  
> > > > > +/*
> > > > > + * Some applications like SPDK reads PCI_INTERRUPT_LINE to
> > > > > decide
> > > > > + * whether INTx is enabled or not. Since VMD doesn't support
> > > > > INTx,
> > > > > + * write 0 to all NVMe devices under VMD.
> > > > > + */
> > > > > +static int vmd_clr_int_line_reg(struct pci_dev *dev, void
> > > > > *userdata) +{
> > > > > +	if(dev->class == PCI_CLASS_STORAGE_EXPRESS)
> > > > > +		pci_write_config_byte(dev, PCI_INTERRUPT_LINE,
> > > > > 0);
> > > > > +	return 0;
> > > > > +}
> > > > > +
> > > > >  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long
> > > > > features) {
> > > > >  	struct pci_sysdata *sd = &vmd->sysdata;
> > > > > @@ -932,6 +944,7 @@ static int vmd_enable_domain(struct vmd_dev
> > > > > *vmd, unsigned long features) 
> > > > >  	pci_scan_child_bus(vmd->bus);
> > > > >  	vmd_domain_reset(vmd);
> > > > > +	pci_walk_bus(vmd->bus, vmd_clr_int_line_reg,
> > > > > &features); 
> > > > >  	/* When Intel VMD is enabled, the OS does not discover
> > > > > the Root Ports
> > > > >  	 * owned by Intel VMD within the MMCFG space.
> > > > > pci_reset_bus() applies -- 
> > > > > 2.39.1
> > > > > 
> > > > >     
> > > >   
> > >   
> > 
>
Dan Williams Sept. 12, 2024, 5:11 p.m. UTC | #6
Manivannan Sadhasivam wrote:
> On Thu, Aug 22, 2024 at 11:30:10AM -0700, Nirmal Patel wrote:
> > On Thu, 22 Aug 2024 15:18:06 +0530
> > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> > 
> > > On Tue, Aug 20, 2024 at 03:32:13PM -0700, Nirmal Patel wrote:
> > > > VMD does not support INTx for devices downstream from a VMD
> > > > endpoint. So initialize the PCI_INTERRUPT_LINE to 0 for all NVMe
> > > > devices under VMD to ensure other applications don't try to set up
> > > > an INTx for them.
> > > > 
> > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>  
> > > 
> > > I shared a diff to put it in pci_assign_irq() and you said that you
> > > were going to test it [1]. I don't see a reply to that and now you
> > > came up with another approach.
> > > 
> > > What happened inbetween?
> > 
> > Apologies, I did perform the tests and the patch worked fine. However, I
> > was able to see lot of bridge devices had the register set to 0xFF and I
> > didn't want to alter them.
> 
> You should've either replied to my comment or mentioned it in the changelog.
> 
> > Also pci_assign_irg would still set the
> > interrupt line register to 0 with or without VMD. Since I didn't want to
> > introduce issues for non-VMD setup, I decide to keep the change limited
> > only to the VMD.
> > 
> 
> Sorry no. SPDK usecase is not specific to VMD and so is the issue. So this
> should be fixed in the PCI core as I proposed. What if another bridge also wants
> to do the same?

Going to say this rather harshly, but there is no conceivable universe I
can imagine where the Linux PCI core should be bothered with the
idiosyncracies of VMD. VMD is not a PCI bridge.

SPDK is fully capable of doing this fixup itself in the presence of VMD.
Unless and until this problem is apparent in more than just a kernel
bypass development kit should the kernel worry about it, and even then
the fixup must be contained to the VMD driver with all its other
workarounds that to try to get back to standards compliant PCI bridge
behavior.
Manivannan Sadhasivam Sept. 12, 2024, 5:25 p.m. UTC | #7
On Thu, Sep 12, 2024 at 10:11:25AM -0700, Dan Williams wrote:
> Manivannan Sadhasivam wrote:
> > On Thu, Aug 22, 2024 at 11:30:10AM -0700, Nirmal Patel wrote:
> > > On Thu, 22 Aug 2024 15:18:06 +0530
> > > Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:
> > > 
> > > > On Tue, Aug 20, 2024 at 03:32:13PM -0700, Nirmal Patel wrote:
> > > > > VMD does not support INTx for devices downstream from a VMD
> > > > > endpoint. So initialize the PCI_INTERRUPT_LINE to 0 for all NVMe
> > > > > devices under VMD to ensure other applications don't try to set up
> > > > > an INTx for them.
> > > > > 
> > > > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>  
> > > > 
> > > > I shared a diff to put it in pci_assign_irq() and you said that you
> > > > were going to test it [1]. I don't see a reply to that and now you
> > > > came up with another approach.
> > > > 
> > > > What happened inbetween?
> > > 
> > > Apologies, I did perform the tests and the patch worked fine. However, I
> > > was able to see lot of bridge devices had the register set to 0xFF and I
> > > didn't want to alter them.
> > 
> > You should've either replied to my comment or mentioned it in the changelog.
> > 
> > > Also pci_assign_irg would still set the
> > > interrupt line register to 0 with or without VMD. Since I didn't want to
> > > introduce issues for non-VMD setup, I decide to keep the change limited
> > > only to the VMD.
> > > 
> > 
> > Sorry no. SPDK usecase is not specific to VMD and so is the issue. So this
> > should be fixed in the PCI core as I proposed. What if another bridge also wants
> > to do the same?
> 
> Going to say this rather harshly, but there is no conceivable universe I
> can imagine where the Linux PCI core should be bothered with the
> idiosyncracies of VMD. VMD is not a PCI bridge.
> 

I don't think the issue should be constrained to VMD only. Based on my
conversation with Nirmal [1], I understood that it is SPDK that makes wrong
assumption if the device's PCI_INTERRUPT_LINE is non-zero (and I assumed that
other application could do the same). In that case, how it can be classified
as the "idiosyncracy" of VMD? SPDK is not tied to VMD, isn't it?

- Mani

[1] https://lore.kernel.org/linux-pci/20240730052830.GA3122@thinkpad/

> SPDK is fully capable of doing this fixup itself in the presence of VMD.
> Unless and until this problem is apparent in more than just a kernel
> bypass development kit should the kernel worry about it, and even then
> the fixup must be contained to the VMD driver with all its other
> workarounds that to try to get back to standards compliant PCI bridge
> behavior.
Dan Williams Sept. 12, 2024, 6:10 p.m. UTC | #8
Manivannan Sadhasivam wrote:
[..]
> I don't think the issue should be constrained to VMD only. Based on my
> conversation with Nirmal [1], I understood that it is SPDK that makes wrong
> assumption if the device's PCI_INTERRUPT_LINE is non-zero (and I assumed that
> other application could do the same).

I am skeptical one can find an example of an application that gets
similarly confused. SPDK is not a typical consumer of PCI device
information.

> In that case, how it can be classified as the "idiosyncracy" of VMD?

If VMD were a typical PCIe switch, firmware would have already fixed up
these values. In fact this problem could likely also be fixed in
platform firmware, but the history of VMD is to leak workaround after
workaround into the kernel.

> SPDK is not tied to VMD, isn't it?

It is not, but SPDK replaces significant pieces of the kernel with
userspace drivers. In this respect a VMD driver quirk in SPDK is no
different than the NVME driver quirks it already needs to carry in its
userspace NVME driver.

Now, if you told me that the damage was more widespread than a project
that is meant to replace kernel drivers, then a kernel fix should be
explored. Until then, let SPDK carry the quirk until it becomes clear
that there are practical examples of wider damage.
Nirmal Patel Sept. 12, 2024, 7:15 p.m. UTC | #9
On Thu, 12 Sep 2024 11:10:21 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Manivannan Sadhasivam wrote:
> [..]
> > I don't think the issue should be constrained to VMD only. Based on
> > my conversation with Nirmal [1], I understood that it is SPDK that
> > makes wrong assumption if the device's PCI_INTERRUPT_LINE is
> > non-zero (and I assumed that other application could do the same).  
> 
> I am skeptical one can find an example of an application that gets
> similarly confused. SPDK is not a typical consumer of PCI device
> information.
> 
> > In that case, how it can be classified as the "idiosyncracy" of
> > VMD?  
> 
> If VMD were a typical PCIe switch, firmware would have already fixed
> up these values. In fact this problem could likely also be fixed in
> platform firmware, but the history of VMD is to leak workaround after
> workaround into the kernel.

This is not VMD leaking workaround in kernel, rather the patch is
trying to keep fix limited to VMD driver. I tried over 10 different
NVMes and only 1 vendor has PCI_INTERRUPT_LINE register set to 0xFF.
The platform firmware doesn't change that with or without VMD.

> 
> > SPDK is not tied to VMD, isn't it?  
> 
> It is not, but SPDK replaces significant pieces of the kernel with
> userspace drivers. In this respect a VMD driver quirk in SPDK is no
> different than the NVME driver quirks it already needs to carry in its
> userspace NVME driver.
> 
> Now, if you told me that the damage was more widespread than a project
> that is meant to replace kernel drivers, then a kernel fix should be
> explored. Until then, let SPDK carry the quirk until it becomes clear
> that there are practical examples of wider damage.
Dan Williams Sept. 13, 2024, 12:01 a.m. UTC | #10
Nirmal Patel wrote:
> On Thu, 12 Sep 2024 11:10:21 -0700
> Dan Williams <dan.j.williams@intel.com> wrote:
> 
> > Manivannan Sadhasivam wrote:
> > [..]
> > > I don't think the issue should be constrained to VMD only. Based on
> > > my conversation with Nirmal [1], I understood that it is SPDK that
> > > makes wrong assumption if the device's PCI_INTERRUPT_LINE is
> > > non-zero (and I assumed that other application could do the same).  
> > 
> > I am skeptical one can find an example of an application that gets
> > similarly confused. SPDK is not a typical consumer of PCI device
> > information.
> > 
> > > In that case, how it can be classified as the "idiosyncracy" of
> > > VMD?  
> > 
> > If VMD were a typical PCIe switch, firmware would have already fixed
> > up these values. In fact this problem could likely also be fixed in
> > platform firmware, but the history of VMD is to leak workaround after
> > workaround into the kernel.
> 
> This is not VMD leaking workaround in kernel, rather the patch is
> trying to keep fix limited to VMD driver.

Oh, ok, I see that now, however...

> I tried over 10 different NVMes and only 1 vendor has
> PCI_INTERRUPT_LINE register set to 0xFF.  The platform firmware
> doesn't change that with or without VMD.

...SPDK is still asserting that it wants to be the NVME host driver in
userspace. As part of that it gets to keep all the pieces and must
realize that a device that has MSI/-X enabled is not using INTx
regardless of that register value.

Do not force the kernel to abide by SPDK expectations when the PCI core
/ Linux-NVME driver contract does not need the PCI_INTERRUPT_LINE
cleared. If SPDK is taking over NVME, it gets to take over *all* of it.
Manivannan Sadhasivam Sept. 13, 2024, 10:55 a.m. UTC | #11
On Thu, Sep 12, 2024 at 05:01:57PM -0700, Dan Williams wrote:
> Nirmal Patel wrote:
> > On Thu, 12 Sep 2024 11:10:21 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> > 
> > > Manivannan Sadhasivam wrote:
> > > [..]
> > > > I don't think the issue should be constrained to VMD only. Based on
> > > > my conversation with Nirmal [1], I understood that it is SPDK that
> > > > makes wrong assumption if the device's PCI_INTERRUPT_LINE is
> > > > non-zero (and I assumed that other application could do the same).  
> > > 
> > > I am skeptical one can find an example of an application that gets
> > > similarly confused. SPDK is not a typical consumer of PCI device
> > > information.
> > > 
> > > > In that case, how it can be classified as the "idiosyncracy" of
> > > > VMD?  
> > > 
> > > If VMD were a typical PCIe switch, firmware would have already fixed
> > > up these values. In fact this problem could likely also be fixed in
> > > platform firmware, but the history of VMD is to leak workaround after
> > > workaround into the kernel.
> > 
> > This is not VMD leaking workaround in kernel, rather the patch is
> > trying to keep fix limited to VMD driver.
> 
> Oh, ok, I see that now, however...
> 
> > I tried over 10 different NVMes and only 1 vendor has
> > PCI_INTERRUPT_LINE register set to 0xFF.  The platform firmware
> > doesn't change that with or without VMD.
> 
> ...SPDK is still asserting that it wants to be the NVME host driver in
> userspace. As part of that it gets to keep all the pieces and must
> realize that a device that has MSI/-X enabled is not using INTx
> regardless of that register value.
> 
> Do not force the kernel to abide by SPDK expectations when the PCI core
> / Linux-NVME driver contract does not need the PCI_INTERRUPT_LINE
> cleared. If SPDK is taking over NVME, it gets to take over *all* of it.

In that case, we do not need a fix at all (even for VMD). My initial assumption
was that, some other userspace applications may also behave the same way as
SPDK. But I agree with you that unless we hear about them, it doesn't warrant a
fix in the kernel. Thanks!

- Mani
Nirmal Patel Sept. 13, 2024, 4:02 p.m. UTC | #12
On Fri, 13 Sep 2024 16:25:41 +0530
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> wrote:

> On Thu, Sep 12, 2024 at 05:01:57PM -0700, Dan Williams wrote:
> > Nirmal Patel wrote:  
> > > On Thu, 12 Sep 2024 11:10:21 -0700
> > > Dan Williams <dan.j.williams@intel.com> wrote:
> > >   
> > > > Manivannan Sadhasivam wrote:
> > > > [..]  
> > > > > I don't think the issue should be constrained to VMD only.
> > > > > Based on my conversation with Nirmal [1], I understood that
> > > > > it is SPDK that makes wrong assumption if the device's
> > > > > PCI_INTERRUPT_LINE is non-zero (and I assumed that other
> > > > > application could do the same).    
> > > > 
> > > > I am skeptical one can find an example of an application that
> > > > gets similarly confused. SPDK is not a typical consumer of PCI
> > > > device information.
> > > >   
> > > > > In that case, how it can be classified as the "idiosyncracy"
> > > > > of VMD?    
> > > > 
> > > > If VMD were a typical PCIe switch, firmware would have already
> > > > fixed up these values. In fact this problem could likely also
> > > > be fixed in platform firmware, but the history of VMD is to
> > > > leak workaround after workaround into the kernel.  
> > > 
> > > This is not VMD leaking workaround in kernel, rather the patch is
> > > trying to keep fix limited to VMD driver.  
> > 
> > Oh, ok, I see that now, however...
> >   
> > > I tried over 10 different NVMes and only 1 vendor has
> > > PCI_INTERRUPT_LINE register set to 0xFF.  The platform firmware
> > > doesn't change that with or without VMD.  
> > 
> > ...SPDK is still asserting that it wants to be the NVME host driver
> > in userspace. As part of that it gets to keep all the pieces and
> > must realize that a device that has MSI/-X enabled is not using INTx
> > regardless of that register value.
> > 
> > Do not force the kernel to abide by SPDK expectations when the PCI
> > core / Linux-NVME driver contract does not need the
> > PCI_INTERRUPT_LINE cleared. If SPDK is taking over NVME, it gets to
> > take over *all* of it.  
> 
> In that case, we do not need a fix at all (even for VMD). My initial
> assumption was that, some other userspace applications may also
> behave the same way as SPDK. But I agree with you that unless we hear
> about them, it doesn't warrant a fix in the kernel. Thanks!

Okay, We can drop this patch for now. Thanks.
> 
> - Mani
>
diff mbox series

Patch

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index a726de0af011..2e9b99969b81 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -778,6 +778,18 @@  static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata)
 	return 0;
 }
 
+/*
+ * Some applications like SPDK reads PCI_INTERRUPT_LINE to decide
+ * whether INTx is enabled or not. Since VMD doesn't support INTx,
+ * write 0 to all NVMe devices under VMD.
+ */
+static int vmd_clr_int_line_reg(struct pci_dev *dev, void *userdata)
+{
+	if(dev->class == PCI_CLASS_STORAGE_EXPRESS)
+		pci_write_config_byte(dev, PCI_INTERRUPT_LINE, 0);
+	return 0;
+}
+
 static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 {
 	struct pci_sysdata *sd = &vmd->sysdata;
@@ -932,6 +944,7 @@  static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
 
 	pci_scan_child_bus(vmd->bus);
 	vmd_domain_reset(vmd);
+	pci_walk_bus(vmd->bus, vmd_clr_int_line_reg, &features);
 
 	/* When Intel VMD is enabled, the OS does not discover the Root Ports
 	 * owned by Intel VMD within the MMCFG space. pci_reset_bus() applies