Message ID | 20180516213306.27027-1-mr.nuke.me@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote: > AER status bits are sticky, and they survive system resets. Downstream > devices are usually taken care of after re-enumerating the downstream > busses, as the AER bits are cleared during probe(). > > However, nothing clears the bits of the port which contained the > error. These sticky bits may leave some BIOSes to think that something > bad happened, and print ominous messages on next boot. To prevent this, > tidy up the AER status bits before releasing containment. > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> > --- > drivers/pci/pcie/dpc.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > index 8c57d607e603..bf82d6936556 100644 > --- a/drivers/pci/pcie/dpc.c > +++ b/drivers/pci/pcie/dpc.c > @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work) > dpc->rp_pio_status = 0; > } > > + /* DPC event made a mess of our AER status bits. Clean them up. */ > + pci_cleanup_aer_error_status_regs(pdev); > + /* TODO: Should we also use aer_print_error to log the event? */ > + > pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, > PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT); > > I think Keith has a patch to fix this. It was under review at some point.
On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote: > On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote: > > AER status bits are sticky, and they survive system resets. Downstream > > devices are usually taken care of after re-enumerating the downstream > > busses, as the AER bits are cleared during probe(). > > > > However, nothing clears the bits of the port which contained the > > error. These sticky bits may leave some BIOSes to think that something > > bad happened, and print ominous messages on next boot. To prevent this, > > tidy up the AER status bits before releasing containment. > > > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> > > --- > > drivers/pci/pcie/dpc.c | 4 ++++ > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > > index 8c57d607e603..bf82d6936556 100644 > > --- a/drivers/pci/pcie/dpc.c > > +++ b/drivers/pci/pcie/dpc.c > > @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work) > > dpc->rp_pio_status = 0; > > } > > > > + /* DPC event made a mess of our AER status bits. Clean them up. */ > > + pci_cleanup_aer_error_status_regs(pdev); > > + /* TODO: Should we also use aer_print_error to log the event? */ > > + > > pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, > > PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT); > > > > > > I think Keith has a patch to fix this. It was under review at some point. Right, I do intend to following up on this, but I've had some trouble finding time the last few weeks. Sorry about that, things will clear up for me shortly.
On Wed, May 16, 2018 at 05:12:21PM -0600, Keith Busch wrote: > On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote: > > On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote: > > > AER status bits are sticky, and they survive system resets. Downstream > > > devices are usually taken care of after re-enumerating the downstream > > > busses, as the AER bits are cleared during probe(). > > > > > > However, nothing clears the bits of the port which contained the > > > error. These sticky bits may leave some BIOSes to think that something > > > bad happened, and print ominous messages on next boot. To prevent this, > > > tidy up the AER status bits before releasing containment. > > > > > > Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> > > > --- > > > drivers/pci/pcie/dpc.c | 4 ++++ > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > > > index 8c57d607e603..bf82d6936556 100644 > > > --- a/drivers/pci/pcie/dpc.c > > > +++ b/drivers/pci/pcie/dpc.c > > > @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work) > > > dpc->rp_pio_status = 0; > > > } > > > > > > + /* DPC event made a mess of our AER status bits. Clean them up. */ > > > + pci_cleanup_aer_error_status_regs(pdev); > > > + /* TODO: Should we also use aer_print_error to log the event? */ > > > + > > > pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, > > > PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT); > > > > > > > > > > I think Keith has a patch to fix this. It was under review at some point. > > Right, I do intend to following up on this, but I've had some trouble > finding time the last few weeks. Sorry about that, things will clear up > for me shortly. I'll drop this (Alexandru's) patch for now, waiting for your update, Keith.
On 06/19/2018 04:57 PM, Bjorn Helgaas wrote: > On Wed, May 16, 2018 at 05:12:21PM -0600, Keith Busch wrote: >> On Wed, May 16, 2018 at 06:44:22PM -0400, Sinan Kaya wrote: >>> On 5/16/2018 5:33 PM, Alexandru Gagniuc wrote: >>>> AER status bits are sticky, and they survive system resets. Downstream >>>> devices are usually taken care of after re-enumerating the downstream >>>> busses, as the AER bits are cleared during probe(). >>>> >>>> However, nothing clears the bits of the port which contained the >>>> error. These sticky bits may leave some BIOSes to think that something >>>> bad happened, and print ominous messages on next boot. To prevent this, >>>> tidy up the AER status bits before releasing containment. >>>> >>>> Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> >>>> --- >>>> drivers/pci/pcie/dpc.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c >>>> index 8c57d607e603..bf82d6936556 100644 >>>> --- a/drivers/pci/pcie/dpc.c >>>> +++ b/drivers/pci/pcie/dpc.c >>>> @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work) >>>> dpc->rp_pio_status = 0; >>>> } >>>> >>>> + /* DPC event made a mess of our AER status bits. Clean them up. */ >>>> + pci_cleanup_aer_error_status_regs(pdev); >>>> + /* TODO: Should we also use aer_print_error to log the event? */ >>>> + >>>> pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, >>>> PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT); >>>> >>>> >>> >>> I think Keith has a patch to fix this. It was under review at some point. >> >> Right, I do intend to following up on this, but I've had some trouble >> finding time the last few weeks. Sorry about that, things will clear up >> for me shortly. > > I'll drop this (Alexandru's) patch for now, waiting for your update, Keith. I wonder if clearing AER status bits is mutually exclusive with refactoring other parts of DPC handling? Alex
diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 8c57d607e603..bf82d6936556 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -112,6 +112,10 @@ static void dpc_work(struct work_struct *work) dpc->rp_pio_status = 0; } + /* DPC event made a mess of our AER status bits. Clean them up. */ + pci_cleanup_aer_error_status_regs(pdev); + /* TODO: Should we also use aer_print_error to log the event? */ + pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS, PCI_EXP_DPC_STATUS_TRIGGER | PCI_EXP_DPC_STATUS_INTERRUPT);
AER status bits are sticky, and they survive system resets. Downstream devices are usually taken care of after re-enumerating the downstream busses, as the AER bits are cleared during probe(). However, nothing clears the bits of the port which contained the error. These sticky bits may leave some BIOSes to think that something bad happened, and print ominous messages on next boot. To prevent this, tidy up the AER status bits before releasing containment. Signed-off-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> --- drivers/pci/pcie/dpc.c | 4 ++++ 1 file changed, 4 insertions(+)