diff mbox series

[v3] PCI: vmd: always enable host bridge hotplug support flags

Message ID 20240408183927.135-1-paul.m.stillwell.jr@intel.com (mailing list archive)
State Changes Requested
Delegated to: Bjorn Helgaas
Headers show
Series [v3] PCI: vmd: always enable host bridge hotplug support flags | expand

Commit Message

Paul M Stillwell Jr April 8, 2024, 6:39 p.m. UTC
Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
code to copy the _OSC flags from the root bridge to the host bridge for each
vmd device because the AER bits were not being set correctly which was
causing an AER interrupt storm for certain NVMe devices.

This works fine in bare metal environments, but causes problems when the
vmd driver is run in a hypervisor environment. In a hypervisor all the
_OSC bits are 0 despite what the underlying hardware indicates. This is
a problem for vmd users because if vmd is enabled the user *always*
wants hotplug support enabled. To solve this issue the vmd driver always
enables the hotplug bits in the host bridge structure for each vmd.

Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
---
 drivers/pci/controller/vmd.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Bjorn Helgaas May 2, 2024, 10:08 p.m. UTC | #1
On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr wrote:
> Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
> code to copy the _OSC flags from the root bridge to the host bridge for each
> vmd device because the AER bits were not being set correctly which was
> causing an AER interrupt storm for certain NVMe devices.
> 
> This works fine in bare metal environments, but causes problems when the
> vmd driver is run in a hypervisor environment. In a hypervisor all the
> _OSC bits are 0 despite what the underlying hardware indicates. This is
> a problem for vmd users because if vmd is enabled the user *always*
> wants hotplug support enabled. To solve this issue the vmd driver always
> enables the hotplug bits in the host bridge structure for each vmd.
> 
> Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
> ---
>  drivers/pci/controller/vmd.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> index 87b7856f375a..583b10bd5eb7 100644
> --- a/drivers/pci/controller/vmd.c
> +++ b/drivers/pci/controller/vmd.c
> @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>  static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>  				       struct pci_host_bridge *vmd_bridge)
>  {
> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> +	/*
> +	 * there is an issue when the vmd driver is running within a hypervisor
> +	 * because all of the _OSC bits are 0 in that case. this disables
> +	 * hotplug support, but users who enable VMD in their BIOS always want
> +	 * hotplug suuport so always enable it.
> +	 */
> +	vmd_bridge->native_pcie_hotplug = 1;
> +	vmd_bridge->native_shpc_hotplug = 1;

Deferred for now because I think we need to figure out how to set all
these bits the same, or at least with a better algorithm than "here's
what we want in this environment."

Extended discussion about this at
https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com

>  	vmd_bridge->native_aer = root_bridge->native_aer;
>  	vmd_bridge->native_pme = root_bridge->native_pme;
>  	vmd_bridge->native_ltr = root_bridge->native_ltr;
> -- 
> 2.39.1
>
Paul M Stillwell Jr May 2, 2024, 10:38 p.m. UTC | #2
On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:
> On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr wrote:
>> Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
>> code to copy the _OSC flags from the root bridge to the host bridge for each
>> vmd device because the AER bits were not being set correctly which was
>> causing an AER interrupt storm for certain NVMe devices.
>>
>> This works fine in bare metal environments, but causes problems when the
>> vmd driver is run in a hypervisor environment. In a hypervisor all the
>> _OSC bits are 0 despite what the underlying hardware indicates. This is
>> a problem for vmd users because if vmd is enabled the user *always*
>> wants hotplug support enabled. To solve this issue the vmd driver always
>> enables the hotplug bits in the host bridge structure for each vmd.
>>
>> Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
>> Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
>> Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
>> ---
>>   drivers/pci/controller/vmd.c | 10 ++++++++--
>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
>> index 87b7856f375a..583b10bd5eb7 100644
>> --- a/drivers/pci/controller/vmd.c
>> +++ b/drivers/pci/controller/vmd.c
>> @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
>>   static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
>>   				       struct pci_host_bridge *vmd_bridge)
>>   {
>> -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
>> -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
>> +	/*
>> +	 * there is an issue when the vmd driver is running within a hypervisor
>> +	 * because all of the _OSC bits are 0 in that case. this disables
>> +	 * hotplug support, but users who enable VMD in their BIOS always want
>> +	 * hotplug suuport so always enable it.
>> +	 */
>> +	vmd_bridge->native_pcie_hotplug = 1;
>> +	vmd_bridge->native_shpc_hotplug = 1;
> 
> Deferred for now because I think we need to figure out how to set all
> these bits the same, or at least with a better algorithm than "here's
> what we want in this environment."
> 
> Extended discussion about this at
> https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com
> 

That's ok by me. I thought where we left it was that if we could find a 
solution to the Correctable Errors from the original issue that maybe we 
could revert 04b12ef163d1.

I'm not sure I would know if a patch that fixes the Correctable Errors 
comes in... We have a test case we would like to test against that was 
pre 04b12ef163d1 (BIOS has AER disabled and we hotplug a disk which 
results in AER interrupts) so we would be curious if the issues we saw 
before goes away with a new patch for Correctable Errors.

Paul
>>   	vmd_bridge->native_aer = root_bridge->native_aer;
>>   	vmd_bridge->native_pme = root_bridge->native_pme;
>>   	vmd_bridge->native_ltr = root_bridge->native_ltr;
>> -- 
>> 2.39.1
>>
>
Bjorn Helgaas May 2, 2024, 10:56 p.m. UTC | #3
On Thu, May 02, 2024 at 03:38:00PM -0700, Paul M Stillwell Jr wrote:
> On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:
> > On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr wrote:
> > > Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features") added
> > > code to copy the _OSC flags from the root bridge to the host bridge for each
> > > vmd device because the AER bits were not being set correctly which was
> > > causing an AER interrupt storm for certain NVMe devices.
> > > 
> > > This works fine in bare metal environments, but causes problems when the
> > > vmd driver is run in a hypervisor environment. In a hypervisor all the
> > > _OSC bits are 0 despite what the underlying hardware indicates. This is
> > > a problem for vmd users because if vmd is enabled the user *always*
> > > wants hotplug support enabled. To solve this issue the vmd driver always
> > > enables the hotplug bits in the host bridge structure for each vmd.
> > > 
> > > Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe features")
> > > Signed-off-by: Nirmal Patel <nirmal.patel@linux.intel.com>
> > > Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
> > > ---
> > >   drivers/pci/controller/vmd.c | 10 ++++++++--
> > >   1 file changed, 8 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
> > > index 87b7856f375a..583b10bd5eb7 100644
> > > --- a/drivers/pci/controller/vmd.c
> > > +++ b/drivers/pci/controller/vmd.c
> > > @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev *vmd)
> > >   static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
> > >   				       struct pci_host_bridge *vmd_bridge)
> > >   {
> > > -	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
> > > -	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
> > > +	/*
> > > +	 * there is an issue when the vmd driver is running within a hypervisor
> > > +	 * because all of the _OSC bits are 0 in that case. this disables
> > > +	 * hotplug support, but users who enable VMD in their BIOS always want
> > > +	 * hotplug suuport so always enable it.
> > > +	 */
> > > +	vmd_bridge->native_pcie_hotplug = 1;
> > > +	vmd_bridge->native_shpc_hotplug = 1;
> > 
> > Deferred for now because I think we need to figure out how to set all
> > these bits the same, or at least with a better algorithm than "here's
> > what we want in this environment."
> > 
> > Extended discussion about this at
> > https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com
> 
> That's ok by me. I thought where we left it was that if we could find a
> solution to the Correctable Errors from the original issue that maybe we
> could revert 04b12ef163d1.
> 
> I'm not sure I would know if a patch that fixes the Correctable Errors comes
> in... We have a test case we would like to test against that was pre
> 04b12ef163d1 (BIOS has AER disabled and we hotplug a disk which results in
> AER interrupts) so we would be curious if the issues we saw before goes away
> with a new patch for Correctable Errors.

My current theory is that there's some issue with that particular
Samsung NVMe device that causes the Correctable Error flood.  Kai-Heng
says they happen even with VMD disabled.

And there are other reports that don't seem to involve VMD but do
involve this NVMe device ([144d:a80a]):

  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852420
  https://forums.unraid.net/topic/113521-constant-errors-on-logs-after-nvme-upgrade/
  https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/
  https://forum.proxmox.com/threads/pve-kernel-panics-on-reboots.144481/
  https://www.eevblog.com/forum/general-computing/linux-mint-21-02-clone-replace-1tb-nvme-with-a-2tb-nvme/

NVMe has weird power management stuff, so it's always possible we're
doing something wrong in a driver.

But I think we really need to handle Correctable Errors better by:

  - Possibly having drivers mask errors if they know about defects
  - Making the log messages less alarming, e.g.,  a single line report
  - Rate-limiting them so they're never overwhelming
  - Maybe automatically masking them in the PCI core to avoid interrupts

> > >   	vmd_bridge->native_aer = root_bridge->native_aer;
> > >   	vmd_bridge->native_pme = root_bridge->native_pme;
> > >   	vmd_bridge->native_ltr = root_bridge->native_ltr;
> > > -- 
> > > 2.39.1
> > > 
> > 
>
Nirmal Patel June 20, 2024, 10:41 p.m. UTC | #4
On Thu, 2 May 2024 17:56:08 -0500
Bjorn Helgaas <helgaas@kernel.org> wrote:

> On Thu, May 02, 2024 at 03:38:00PM -0700, Paul M Stillwell Jr wrote:
> > On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:  
> > > On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr
> > > wrote:  
> > > > Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
> > > > features") added code to copy the _OSC flags from the root
> > > > bridge to the host bridge for each vmd device because the AER
> > > > bits were not being set correctly which was causing an AER
> > > > interrupt storm for certain NVMe devices.
> > > > 
> > > > This works fine in bare metal environments, but causes problems
> > > > when the vmd driver is run in a hypervisor environment. In a
> > > > hypervisor all the _OSC bits are 0 despite what the underlying
> > > > hardware indicates. This is a problem for vmd users because if
> > > > vmd is enabled the user *always* wants hotplug support enabled.
> > > > To solve this issue the vmd driver always enables the hotplug
> > > > bits in the host bridge structure for each vmd.
> > > > 
> > > > Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
> > > > features") Signed-off-by: Nirmal Patel
> > > > <nirmal.patel@linux.intel.com> Signed-off-by: Paul M Stillwell
> > > > Jr <paul.m.stillwell.jr@intel.com> ---
> > > >   drivers/pci/controller/vmd.c | 10 ++++++++--
> > > >   1 file changed, 8 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/pci/controller/vmd.c
> > > > b/drivers/pci/controller/vmd.c index 87b7856f375a..583b10bd5eb7
> > > > 100644 --- a/drivers/pci/controller/vmd.c
> > > > +++ b/drivers/pci/controller/vmd.c
> > > > @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev
> > > > *vmd) static void vmd_copy_host_bridge_flags(struct
> > > > pci_host_bridge *root_bridge, struct pci_host_bridge
> > > > *vmd_bridge) {
> > > > -	vmd_bridge->native_pcie_hotplug =
> > > > root_bridge->native_pcie_hotplug;
> > > > -	vmd_bridge->native_shpc_hotplug =
> > > > root_bridge->native_shpc_hotplug;
> > > > +	/*
> > > > +	 * there is an issue when the vmd driver is running
> > > > within a hypervisor
> > > > +	 * because all of the _OSC bits are 0 in that case.
> > > > this disables
> > > > +	 * hotplug support, but users who enable VMD in their
> > > > BIOS always want
> > > > +	 * hotplug suuport so always enable it.
> > > > +	 */
> > > > +	vmd_bridge->native_pcie_hotplug = 1;
> > > > +	vmd_bridge->native_shpc_hotplug = 1;  
> > > 
> > > Deferred for now because I think we need to figure out how to set
> > > all these bits the same, or at least with a better algorithm than
> > > "here's what we want in this environment."
> > > 
> > > Extended discussion about this at
> > > https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com
> > >  
> > 
> > That's ok by me. I thought where we left it was that if we could
> > find a solution to the Correctable Errors from the original issue
> > that maybe we could revert 04b12ef163d1.
> > 
> > I'm not sure I would know if a patch that fixes the Correctable
> > Errors comes in... We have a test case we would like to test
> > against that was pre 04b12ef163d1 (BIOS has AER disabled and we
> > hotplug a disk which results in AER interrupts) so we would be
> > curious if the issues we saw before goes away with a new patch for
> > Correctable Errors.  
> 
> My current theory is that there's some issue with that particular
> Samsung NVMe device that causes the Correctable Error flood.  Kai-Heng
> says they happen even with VMD disabled.
> 
> And there are other reports that don't seem to involve VMD but do
> involve this NVMe device ([144d:a80a]):
> 
>   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852420
>   https://forums.unraid.net/topic/113521-constant-errors-on-logs-after-nvme-upgrade/
>   https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/
>   https://forum.proxmox.com/threads/pve-kernel-panics-on-reboots.144481/
>   https://www.eevblog.com/forum/general-computing/linux-mint-21-02-clone-replace-1tb-nvme-with-a-2tb-nvme/
> 
> NVMe has weird power management stuff, so it's always possible we're
> doing something wrong in a driver.
> 
> But I think we really need to handle Correctable Errors better by:
> 
>   - Possibly having drivers mask errors if they know about defects
>   - Making the log messages less alarming, e.g.,  a single line report
>   - Rate-limiting them so they're never overwhelming
>   - Maybe automatically masking them in the PCI core to avoid
> interrupts
> 
> > > >   	vmd_bridge->native_aer = root_bridge->native_aer;
> > > >   	vmd_bridge->native_pme = root_bridge->native_pme;
> > > >   	vmd_bridge->native_ltr = root_bridge->native_ltr;
> > > > -- 
> > > > 2.39.1
> > > >   
> > >   
> >   

Hi Bjorn,

Do we still expect to get this patch accepted?

Based on the previous comments, even with AER fixed we will still have
an issue in Guest OS of disabling all the features which will require
making adjustments and/or removing 04b12ef163d1.

Is it possible to accept this patch and add necessary changes
when AER fix is available?

Thanks
-nirmal
Bjorn Helgaas June 21, 2024, 9:16 p.m. UTC | #5
On Thu, Jun 20, 2024 at 03:41:33PM -0700, Nirmal Patel wrote:
> On Thu, 2 May 2024 17:56:08 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > On Thu, May 02, 2024 at 03:38:00PM -0700, Paul M Stillwell Jr wrote:
> > > On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:  
> > > > On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr
> > > > wrote:  
> > > > > Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
> > > > > features") added code to copy the _OSC flags from the root
> > > > > bridge to the host bridge for each vmd device because the AER
> > > > > bits were not being set correctly which was causing an AER
> > > > > interrupt storm for certain NVMe devices.
> > > > > 
> > > > > This works fine in bare metal environments, but causes problems
> > > > > when the vmd driver is run in a hypervisor environment. In a
> > > > > hypervisor all the _OSC bits are 0 despite what the underlying
> > > > > hardware indicates. This is a problem for vmd users because if
> > > > > vmd is enabled the user *always* wants hotplug support enabled.
> > > > > To solve this issue the vmd driver always enables the hotplug
> > > > > bits in the host bridge structure for each vmd.
> > > > > 
> > > > > Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
> > > > > features") Signed-off-by: Nirmal Patel
> > > > > <nirmal.patel@linux.intel.com> Signed-off-by: Paul M Stillwell
> > > > > Jr <paul.m.stillwell.jr@intel.com> ---
> > > > >   drivers/pci/controller/vmd.c | 10 ++++++++--
> > > > >   1 file changed, 8 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/drivers/pci/controller/vmd.c
> > > > > b/drivers/pci/controller/vmd.c index 87b7856f375a..583b10bd5eb7
> > > > > 100644 --- a/drivers/pci/controller/vmd.c
> > > > > +++ b/drivers/pci/controller/vmd.c
> > > > > @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev
> > > > > *vmd) static void vmd_copy_host_bridge_flags(struct
> > > > > pci_host_bridge *root_bridge, struct pci_host_bridge
> > > > > *vmd_bridge) {
> > > > > -	vmd_bridge->native_pcie_hotplug =
> > > > > root_bridge->native_pcie_hotplug;
> > > > > -	vmd_bridge->native_shpc_hotplug =
> > > > > root_bridge->native_shpc_hotplug;
> > > > > +	/*
> > > > > +	 * there is an issue when the vmd driver is running
> > > > > within a hypervisor
> > > > > +	 * because all of the _OSC bits are 0 in that case.
> > > > > this disables
> > > > > +	 * hotplug support, but users who enable VMD in their
> > > > > BIOS always want
> > > > > +	 * hotplug suuport so always enable it.
> > > > > +	 */
> > > > > +	vmd_bridge->native_pcie_hotplug = 1;
> > > > > +	vmd_bridge->native_shpc_hotplug = 1;  
> > > > 
> > > > Deferred for now because I think we need to figure out how to set
> > > > all these bits the same, or at least with a better algorithm than
> > > > "here's what we want in this environment."
> > > > 
> > > > Extended discussion about this at
> > > > https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com
> > > 
> > > That's ok by me. I thought where we left it was that if we could
> > > find a solution to the Correctable Errors from the original issue
> > > that maybe we could revert 04b12ef163d1.
> > > 
> > > I'm not sure I would know if a patch that fixes the Correctable
> > > Errors comes in... We have a test case we would like to test
> > > against that was pre 04b12ef163d1 (BIOS has AER disabled and we
> > > hotplug a disk which results in AER interrupts) so we would be
> > > curious if the issues we saw before goes away with a new patch for
> > > Correctable Errors.  
> > 
> > My current theory is that there's some issue with that particular
> > Samsung NVMe device that causes the Correctable Error flood.  Kai-Heng
> > says they happen even with VMD disabled.
> > 
> > And there are other reports that don't seem to involve VMD but do
> > involve this NVMe device ([144d:a80a]):
> > 
> >   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852420
> >   https://forums.unraid.net/topic/113521-constant-errors-on-logs-after-nvme-upgrade/
> >   https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/
> >   https://forum.proxmox.com/threads/pve-kernel-panics-on-reboots.144481/
> >   https://www.eevblog.com/forum/general-computing/linux-mint-21-02-clone-replace-1tb-nvme-with-a-2tb-nvme/
> > 
> > NVMe has weird power management stuff, so it's always possible we're
> > doing something wrong in a driver.
> > 
> > But I think we really need to handle Correctable Errors better by:
> > 
> >   - Possibly having drivers mask errors if they know about defects
> >   - Making the log messages less alarming, e.g.,  a single line report
> >   - Rate-limiting them so they're never overwhelming
> >   - Maybe automatically masking them in the PCI core to avoid
> > interrupts
> > 
> > > > >   	vmd_bridge->native_aer = root_bridge->native_aer;
> > > > >   	vmd_bridge->native_pme = root_bridge->native_pme;
> > > > >   	vmd_bridge->native_ltr = root_bridge->native_ltr;

> Hi Bjorn,
> 
> Do we still expect to get this patch accepted?
> 
> Based on the previous comments, even with AER fixed we will still have
> an issue in Guest OS of disabling all the features which will require
> making adjustments and/or removing 04b12ef163d1.
> 
> Is it possible to accept this patch and add necessary changes
> when AER fix is available?

I think 04b12ef163d1 is wrong, and we shouldn't copy any of the bits
from the root port.

If vmd takes ownership of all those features, you can decide what to
do with AER, and you can disable it if you want to.

Bjorn
Paul M Stillwell Jr June 25, 2024, 8:42 p.m. UTC | #6
On 6/21/2024 2:16 PM, Bjorn Helgaas wrote:
> On Thu, Jun 20, 2024 at 03:41:33PM -0700, Nirmal Patel wrote:
>> On Thu, 2 May 2024 17:56:08 -0500
>> Bjorn Helgaas <helgaas@kernel.org> wrote:
>>
>>> On Thu, May 02, 2024 at 03:38:00PM -0700, Paul M Stillwell Jr wrote:
>>>> On 5/2/2024 3:08 PM, Bjorn Helgaas wrote:
>>>>> On Mon, Apr 08, 2024 at 11:39:27AM -0700, Paul M Stillwell Jr
>>>>> wrote:
>>>>>> Commit 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
>>>>>> features") added code to copy the _OSC flags from the root
>>>>>> bridge to the host bridge for each vmd device because the AER
>>>>>> bits were not being set correctly which was causing an AER
>>>>>> interrupt storm for certain NVMe devices.
>>>>>>
>>>>>> This works fine in bare metal environments, but causes problems
>>>>>> when the vmd driver is run in a hypervisor environment. In a
>>>>>> hypervisor all the _OSC bits are 0 despite what the underlying
>>>>>> hardware indicates. This is a problem for vmd users because if
>>>>>> vmd is enabled the user *always* wants hotplug support enabled.
>>>>>> To solve this issue the vmd driver always enables the hotplug
>>>>>> bits in the host bridge structure for each vmd.
>>>>>>
>>>>>> Fixes: 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC on PCIe
>>>>>> features") Signed-off-by: Nirmal Patel
>>>>>> <nirmal.patel@linux.intel.com> Signed-off-by: Paul M Stillwell
>>>>>> Jr <paul.m.stillwell.jr@intel.com> ---
>>>>>>    drivers/pci/controller/vmd.c | 10 ++++++++--
>>>>>>    1 file changed, 8 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/pci/controller/vmd.c
>>>>>> b/drivers/pci/controller/vmd.c index 87b7856f375a..583b10bd5eb7
>>>>>> 100644 --- a/drivers/pci/controller/vmd.c
>>>>>> +++ b/drivers/pci/controller/vmd.c
>>>>>> @@ -730,8 +730,14 @@ static int vmd_alloc_irqs(struct vmd_dev
>>>>>> *vmd) static void vmd_copy_host_bridge_flags(struct
>>>>>> pci_host_bridge *root_bridge, struct pci_host_bridge
>>>>>> *vmd_bridge) {
>>>>>> -	vmd_bridge->native_pcie_hotplug =
>>>>>> root_bridge->native_pcie_hotplug;
>>>>>> -	vmd_bridge->native_shpc_hotplug =
>>>>>> root_bridge->native_shpc_hotplug;
>>>>>> +	/*
>>>>>> +	 * there is an issue when the vmd driver is running
>>>>>> within a hypervisor
>>>>>> +	 * because all of the _OSC bits are 0 in that case.
>>>>>> this disables
>>>>>> +	 * hotplug support, but users who enable VMD in their
>>>>>> BIOS always want
>>>>>> +	 * hotplug suuport so always enable it.
>>>>>> +	 */
>>>>>> +	vmd_bridge->native_pcie_hotplug = 1;
>>>>>> +	vmd_bridge->native_shpc_hotplug = 1;
>>>>>
>>>>> Deferred for now because I think we need to figure out how to set
>>>>> all these bits the same, or at least with a better algorithm than
>>>>> "here's what we want in this environment."
>>>>>
>>>>> Extended discussion about this at
>>>>> https://lore.kernel.org/r/20240417201542.102-1-paul.m.stillwell.jr@intel.com
>>>>
>>>> That's ok by me. I thought where we left it was that if we could
>>>> find a solution to the Correctable Errors from the original issue
>>>> that maybe we could revert 04b12ef163d1.
>>>>
>>>> I'm not sure I would know if a patch that fixes the Correctable
>>>> Errors comes in... We have a test case we would like to test
>>>> against that was pre 04b12ef163d1 (BIOS has AER disabled and we
>>>> hotplug a disk which results in AER interrupts) so we would be
>>>> curious if the issues we saw before goes away with a new patch for
>>>> Correctable Errors.
>>>
>>> My current theory is that there's some issue with that particular
>>> Samsung NVMe device that causes the Correctable Error flood.  Kai-Heng
>>> says they happen even with VMD disabled.
>>>
>>> And there are other reports that don't seem to involve VMD but do
>>> involve this NVMe device ([144d:a80a]):
>>>
>>>    https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852420
>>>    https://forums.unraid.net/topic/113521-constant-errors-on-logs-after-nvme-upgrade/
>>>    https://forums.unraid.net/topic/118286-nvme-drives-throwing-errors-filling-logs-instantly-how-to-resolve/
>>>    https://forum.proxmox.com/threads/pve-kernel-panics-on-reboots.144481/
>>>    https://www.eevblog.com/forum/general-computing/linux-mint-21-02-clone-replace-1tb-nvme-with-a-2tb-nvme/
>>>
>>> NVMe has weird power management stuff, so it's always possible we're
>>> doing something wrong in a driver.
>>>
>>> But I think we really need to handle Correctable Errors better by:
>>>
>>>    - Possibly having drivers mask errors if they know about defects
>>>    - Making the log messages less alarming, e.g.,  a single line report
>>>    - Rate-limiting them so they're never overwhelming
>>>    - Maybe automatically masking them in the PCI core to avoid
>>> interrupts
>>>
>>>>>>    	vmd_bridge->native_aer = root_bridge->native_aer;
>>>>>>    	vmd_bridge->native_pme = root_bridge->native_pme;
>>>>>>    	vmd_bridge->native_ltr = root_bridge->native_ltr;
> 
>> Hi Bjorn,
>>
>> Do we still expect to get this patch accepted?
>>
>> Based on the previous comments, even with AER fixed we will still have
>> an issue in Guest OS of disabling all the features which will require
>> making adjustments and/or removing 04b12ef163d1.
>>
>> Is it possible to accept this patch and add necessary changes
>> when AER fix is available?
> 
> I think 04b12ef163d1 is wrong, and we shouldn't copy any of the bits
> from the root port.
> 
> If vmd takes ownership of all those features, you can decide what to
> do with AER, and you can disable it if you want to.
> 
> Bjorn
> 

IIRC we are taking ownership of the flags by (eventually) calling 
pci_init_host_bridge() which sets the native_* bits, right?

One way we could force a re-look at the original issue that 04b12ef163d1 
fixed would be to revert it and break those use cases. I'm not sure I'm 
in favor of that, but right now we have a situation where either VMD is 
broken in some cases or we revert 04b12ef163d1 and break those cases. 
Either way someone is not happy.

Paul
diff mbox series

Patch

diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 87b7856f375a..583b10bd5eb7 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -730,8 +730,14 @@  static int vmd_alloc_irqs(struct vmd_dev *vmd)
 static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge,
 				       struct pci_host_bridge *vmd_bridge)
 {
-	vmd_bridge->native_pcie_hotplug = root_bridge->native_pcie_hotplug;
-	vmd_bridge->native_shpc_hotplug = root_bridge->native_shpc_hotplug;
+	/*
+	 * there is an issue when the vmd driver is running within a hypervisor
+	 * because all of the _OSC bits are 0 in that case. this disables
+	 * hotplug support, but users who enable VMD in their BIOS always want
+	 * hotplug suuport so always enable it.
+	 */
+	vmd_bridge->native_pcie_hotplug = 1;
+	vmd_bridge->native_shpc_hotplug = 1;
 	vmd_bridge->native_aer = root_bridge->native_aer;
 	vmd_bridge->native_pme = root_bridge->native_pme;
 	vmd_bridge->native_ltr = root_bridge->native_ltr;