Message ID | 20171219210643.24615-4-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
+Oza On 12/19/2017 4:06 PM, Keith Busch wrote: > A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, > which prevents the AER handler from reporting the details of the > error. This patch will have the DPC driver get the AER status registers > from the downstream port that detected the uncorrectable error, and > print out additional information. > > Signed-off-by: Keith Busch <keith.busch@intel.com> > --- Oza is doing some restructuring to unify DPC and AER error handling path per feedback from Bjorn. It is almost done. He is testing it. Can this patch wait until you review his version? I'm thinking this could be something that can be added to his series instead. Coming back to this patch. The interrupt number for DPC and AER could be the same or different. According to the spec, AER errors are always reported regardless of DPC driver presence (see the famous flow chart). If the interrupt IDs are the same for AER and DPC, your patch would introduce double printing for AER errors. Sinan
On Wed, Dec 20, 2017 at 11:43:07PM -0500, Sinan Kaya wrote: > +Oza > > On 12/19/2017 4:06 PM, Keith Busch wrote: > > A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, > > which prevents the AER handler from reporting the details of the > > error. This patch will have the DPC driver get the AER status registers > > from the downstream port that detected the uncorrectable error, and > > print out additional information. > > > > Signed-off-by: Keith Busch <keith.busch@intel.com> > > --- > > Oza is doing some restructuring to unify DPC and AER error handling path per > feedback from Bjorn. It is almost done. He is testing it. > > Can this patch wait until you review his version? I'm thinking this could > be something that can be added to his series instead. No problem. > Coming back to this patch. The interrupt number for DPC and AER could be the > same or different. Only if you're talking about root ports. DPC is also a feature of switch downstream ports, which don't generate interrupts on AER events (they don't have a Root Error Command register). > According to the spec, AER errors are always reported > regardless of DPC driver presence (see the famous flow chart). > If the interrupt IDs are the same for AER and DPC, your patch would introduce > double printing for AER errors. The AER Uncorrectable Status Register of the detecting port would indeed be set with the appropriate status if that's the type of error that occured, but when DPC is enabled, the root port never observes an ERR_FATAL/ERR_NONFATAL message required for it to get set the Root Error Status Register. The Linux AER handler requires the Root Error Status be set in order for it to print anything, so I don't think we're at risk of double printing with this patch even if the root port is DPC capable.
Hi Keith, On 12/21/2017 12:12 AM, Keith Busch wrote: > On Wed, Dec 20, 2017 at 11:43:07PM -0500, Sinan Kaya wrote: >> +Oza >> >> On 12/19/2017 4:06 PM, Keith Busch wrote: >>> A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, >>> which prevents the AER handler from reporting the details of the >>> error. This patch will have the DPC driver get the AER status registers >>> from the downstream port that detected the uncorrectable error, and >>> print out additional information. >>> >>> Signed-off-by: Keith Busch <keith.busch@intel.com> >>> --- >> >> Oza is doing some restructuring to unify DPC and AER error handling path per >> feedback from Bjorn. It is almost done. He is testing it. >> >> Can this patch wait until you review his version? I'm thinking this could >> be something that can be added to his series instead. > > No problem. > >> Coming back to this patch. The interrupt number for DPC and AER could be the >> same or different. > > Only if you're talking about root ports. DPC is also a feature of > switch downstream ports, which don't generate interrupts on AER events > (they don't have a Root Error Command register). > >> According to the spec, AER errors are always reported >> regardless of DPC driver presence (see the famous flow chart). > >> If the interrupt IDs are the same for AER and DPC, your patch would introduce >> double printing for AER errors. > > The AER Uncorrectable Status Register of the detecting port would > indeed be set with the appropriate status if that's the type of error > that occured, but when DPC is enabled, the root port never observes an > ERR_FATAL/ERR_NONFATAL message required for it to get set the Root Error > Status Register. The Linux AER handler requires the Root Error Status > be set in order for it to print anything, so I don't think we're at risk > of double printing with this patch even if the root port is DPC capable. > Coming back to this. Oza posted his patch that integrates DPC into PORTDRV similar to AER driver. "[PATCH v3 0/4] Address error and recovery for AER and DPC" We are looking for feedback on the series. The idea in a nut shell is to collect all endpoint error recovery code into a new file called pcie-err.c and then invoke callbacks before DPC driver shuts down the currently active driver. There is however, still one more problem that has not been tacked. Since the AER status is set when we observe DPC event and nobody is clearing these we won't observe another DPC event until somebody clears these. We can say that we are resetting the endpoints as part of the DPC but we are not touching the switch downstream port or the root port registers. Somebody still needs to clear these in addition to printing whatever information is available in the AER registers. Do you agree? Sinan
Hi Keith, On Tue, Dec 19, 2017 at 02:06:43PM -0700, Keith Busch wrote: > A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, > which prevents the AER handler from reporting the details of the > error. This patch will have the DPC driver get the AER status registers > from the downstream port that detected the uncorrectable error, and > print out additional information. > > Signed-off-by: Keith Busch <keith.busch@intel.com> > --- > drivers/pci/pcie/pcie-dpc.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c > index ef71a472592c..aefc4fbdcef0 100644 > --- a/drivers/pci/pcie/pcie-dpc.c > +++ b/drivers/pci/pcie/pcie-dpc.c > @@ -44,6 +44,7 @@ struct dpc_dev { > int cap_pos; > bool rp; > u32 rp_pio_status; > + struct aer_err_info info; > }; > > static const char * const rp_pio_error_string[] = { > @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work) > struct pci_dev *dev, *temp, *pdev = dpc->dev->port; > struct pci_bus *parent = pdev->subordinate; > > + if (aer_get_device_error_info(pdev, &dpc->info)) > + aer_print_error(pdev, &dpc->info); I'm confused about this. "pdev" is the DPC-enabled Port, i.e., the Port that received an ERR_FATAL or ERR_NONFATAL Message. The Message was generated by something below "pdev", and that's where the interesting AER logging would have been done. This patch suggests that the DPC port itself would have something interesting logged in its AER capability. Is that really true? I see in sec 6.2.10 (PCIe r4.0) that the DPC port should log the Trigger Reason and the Error Source ID from the Message, but I don't see anything about logging anything in its AER registers. > pci_lock_rescan_remove(); > list_for_each_entry_safe_reverse(dev, temp, &parent->devices, > bus_list) { > @@ -275,6 +279,10 @@ static irqreturn_t dpc_irq(int irq, void *context) > /* show RP PIO error detail information */ > if (reason == 3 && ext_reason == 0) > dpc_process_rp_pio_error(dpc); > + if (reason == 2) > + dpc->info.severity = AER_FATAL; > + else > + dpc->info.severity = AER_NONFATAL; > > schedule_work(&dpc->work); > } > -- > 2.13.6 >
On Fri, Jan 12, 2018 at 05:03:03PM -0600, Bjorn Helgaas wrote: > Hi Keith, > > On Tue, Dec 19, 2017 at 02:06:43PM -0700, Keith Busch wrote: > > A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, > > which prevents the AER handler from reporting the details of the > > error. This patch will have the DPC driver get the AER status registers > > from the downstream port that detected the uncorrectable error, and > > print out additional information. > > > > Signed-off-by: Keith Busch <keith.busch@intel.com> > > --- > > drivers/pci/pcie/pcie-dpc.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c > > index ef71a472592c..aefc4fbdcef0 100644 > > --- a/drivers/pci/pcie/pcie-dpc.c > > +++ b/drivers/pci/pcie/pcie-dpc.c > > @@ -44,6 +44,7 @@ struct dpc_dev { > > int cap_pos; > > bool rp; > > u32 rp_pio_status; > > + struct aer_err_info info; > > }; > > > > static const char * const rp_pio_error_string[] = { > > @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work) > > struct pci_dev *dev, *temp, *pdev = dpc->dev->port; > > struct pci_bus *parent = pdev->subordinate; > > > > + if (aer_get_device_error_info(pdev, &dpc->info)) > > + aer_print_error(pdev, &dpc->info); > > I'm confused about this. "pdev" is the DPC-enabled Port, i.e., the > Port that received an ERR_FATAL or ERR_NONFATAL Message. The Message > was generated by something below "pdev", and that's where the > interesting AER logging would have been done. > > This patch suggests that the DPC port itself would have something > interesting logged in its AER capability. Is that really true? I see > in sec 6.2.10 (PCIe r4.0) that the DPC port should log the Trigger > Reason and the Error Source ID from the Message, but I don't see > anything about logging anything in its AER registers. If the trigger reason is "unmasked uncorrectable error", then the DPC AER register has useful information. All other reasons would not provide useful data in the AER register. I would assume that aer_get_device_error_info would return false if it's one of other reasons, but I can fix this up to check the trigger reasoning rather than unconditionally reading the AER status
On 1/13/2018 8:35 PM, Keith Busch wrote: > On Fri, Jan 12, 2018 at 05:03:03PM -0600, Bjorn Helgaas wrote: >> Hi Keith, >> >> On Tue, Dec 19, 2017 at 02:06:43PM -0700, Keith Busch wrote: >>> A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, >>> which prevents the AER handler from reporting the details of the >>> error. This patch will have the DPC driver get the AER status registers >>> from the downstream port that detected the uncorrectable error, and >>> print out additional information. >>> >>> Signed-off-by: Keith Busch <keith.busch@intel.com> >>> --- >>> drivers/pci/pcie/pcie-dpc.c | 8 ++++++++ >>> 1 file changed, 8 insertions(+) >>> >>> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c >>> index ef71a472592c..aefc4fbdcef0 100644 >>> --- a/drivers/pci/pcie/pcie-dpc.c >>> +++ b/drivers/pci/pcie/pcie-dpc.c >>> @@ -44,6 +44,7 @@ struct dpc_dev { >>> int cap_pos; >>> bool rp; >>> u32 rp_pio_status; >>> + struct aer_err_info info; >>> }; >>> >>> static const char * const rp_pio_error_string[] = { >>> @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work) >>> struct pci_dev *dev, *temp, *pdev = dpc->dev->port; >>> struct pci_bus *parent = pdev->subordinate; >>> >>> + if (aer_get_device_error_info(pdev, &dpc->info)) >>> + aer_print_error(pdev, &dpc->info); >> >> I'm confused about this. "pdev" is the DPC-enabled Port, i.e., the >> Port that received an ERR_FATAL or ERR_NONFATAL Message. The Message >> was generated by something below "pdev", and that's where the >> interesting AER logging would have been done. >> >> This patch suggests that the DPC port itself would have something >> interesting logged in its AER capability. Is that really true? I see >> in sec 6.2.10 (PCIe r4.0) that the DPC port should log the Trigger >> Reason and the Error Source ID from the Message, but I don't see >> anything about logging anything in its AER registers. > > If the trigger reason is "unmasked uncorrectable error", then the DPC > AER register has useful information. All other reasons would not provide > useful data in the AER register. > > > I would assume that aer_get_device_error_info would return false if > it's one of other reasons, but I can fix this up to check the trigger > reasoning rather than unconditionally reading the AER status > https://pcisig.com/sites/default/files/specification_documents/ECN_DPC_2012-02-09_finalized.pdf If you look at figure 6-2, AER errors are always logged regardless of the DPC presence. That's why, I was suggesting Keith that code should also clear the AER information when such an event happens. Otherwise, same event won't trigger again.
On Wed, Jan 10, 2018 at 10:52:23AM -0500, Sinan Kaya wrote: > Since the AER status is set when we observe DPC event and nobody is clearing these > we won't observe another DPC event until somebody clears these. We can say that > we are resetting the endpoints as part of the DPC but we are not touching the > switch downstream port or the root port registers. > > Somebody still needs to clear these in addition to printing whatever information > is available in the AER registers. > > Do you agree? We should clear the downstream port's AER uncorrectable status register if the trigger reason is of that type, but DPC definitely does not require we clear it in order to observe a new event.
On Sat, Jan 13, 2018 at 06:35:18PM -0700, Keith Busch wrote: > On Fri, Jan 12, 2018 at 05:03:03PM -0600, Bjorn Helgaas wrote: > > Hi Keith, > > > > On Tue, Dec 19, 2017 at 02:06:43PM -0700, Keith Busch wrote: > > > A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, > > > which prevents the AER handler from reporting the details of the > > > error. This patch will have the DPC driver get the AER status registers > > > from the downstream port that detected the uncorrectable error, and > > > print out additional information. > > > > > > Signed-off-by: Keith Busch <keith.busch@intel.com> > > > --- > > > drivers/pci/pcie/pcie-dpc.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c > > > index ef71a472592c..aefc4fbdcef0 100644 > > > --- a/drivers/pci/pcie/pcie-dpc.c > > > +++ b/drivers/pci/pcie/pcie-dpc.c > > > @@ -44,6 +44,7 @@ struct dpc_dev { > > > int cap_pos; > > > bool rp; > > > u32 rp_pio_status; > > > + struct aer_err_info info; > > > }; > > > > > > static const char * const rp_pio_error_string[] = { > > > @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work) > > > struct pci_dev *dev, *temp, *pdev = dpc->dev->port; > > > struct pci_bus *parent = pdev->subordinate; > > > > > > + if (aer_get_device_error_info(pdev, &dpc->info)) > > > + aer_print_error(pdev, &dpc->info); > > > > I'm confused about this. "pdev" is the DPC-enabled Port, i.e., the > > Port that received an ERR_FATAL or ERR_NONFATAL Message. The Message > > was generated by something below "pdev", and that's where the > > interesting AER logging would have been done. > > > > This patch suggests that the DPC port itself would have something > > interesting logged in its AER capability. Is that really true? I see > > in sec 6.2.10 (PCIe r4.0) that the DPC port should log the Trigger > > Reason and the Error Source ID from the Message, but I don't see > > anything about logging anything in its AER registers. > > If the trigger reason is "unmasked uncorrectable error", then the DPC > AER register has useful information. All other reasons would not provide > useful data in the AER register. OK, that makes sense. This "unmasked uncorrectable error" case would be things like the Port receiving an Unsupported Request completion, a Completion Timeout, Completer Abort, etc. In that case, the DPC Port should do its own AER logging before triggering DPC. > I would assume that aer_get_device_error_info would return false if > it's one of other reasons, but I can fix this up to check the trigger > reasoning rather than unconditionally reading the AER status It might be overkill, but it does niggle at me that in these other cases, the hardware isn't telling us to read the AER logging, but we do it anyway. We currently pass no information to interrupt_event_handler(), so it has no way of testing the trigger reason. I guess you could do the test in dpc_irq() before scheduling interrupt_event_handler()? That would move it from the work item to the ISR, but I guess it's essentially the same sort of code as dpc_process_rp_pio_error(), which we already do in the ISR. Bjorn
On Mon, Jan 15, 2018 at 09:32:59AM -0500, Sinan Kaya wrote: > On 1/13/2018 8:35 PM, Keith Busch wrote: > > On Fri, Jan 12, 2018 at 05:03:03PM -0600, Bjorn Helgaas wrote: > >> Hi Keith, > >> > >> On Tue, Dec 19, 2017 at 02:06:43PM -0700, Keith Busch wrote: > >>> A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, > >>> which prevents the AER handler from reporting the details of the > >>> error. This patch will have the DPC driver get the AER status registers > >>> from the downstream port that detected the uncorrectable error, and > >>> print out additional information. > >>> > >>> Signed-off-by: Keith Busch <keith.busch@intel.com> > >>> --- > >>> drivers/pci/pcie/pcie-dpc.c | 8 ++++++++ > >>> 1 file changed, 8 insertions(+) > >>> > >>> diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c > >>> index ef71a472592c..aefc4fbdcef0 100644 > >>> --- a/drivers/pci/pcie/pcie-dpc.c > >>> +++ b/drivers/pci/pcie/pcie-dpc.c > >>> @@ -44,6 +44,7 @@ struct dpc_dev { > >>> int cap_pos; > >>> bool rp; > >>> u32 rp_pio_status; > >>> + struct aer_err_info info; > >>> }; > >>> > >>> static const char * const rp_pio_error_string[] = { > >>> @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work) > >>> struct pci_dev *dev, *temp, *pdev = dpc->dev->port; > >>> struct pci_bus *parent = pdev->subordinate; > >>> > >>> + if (aer_get_device_error_info(pdev, &dpc->info)) > >>> + aer_print_error(pdev, &dpc->info); > >> > >> I'm confused about this. "pdev" is the DPC-enabled Port, i.e., the > >> Port that received an ERR_FATAL or ERR_NONFATAL Message. The Message > >> was generated by something below "pdev", and that's where the > >> interesting AER logging would have been done. > >> > >> This patch suggests that the DPC port itself would have something > >> interesting logged in its AER capability. Is that really true? I see > >> in sec 6.2.10 (PCIe r4.0) that the DPC port should log the Trigger > >> Reason and the Error Source ID from the Message, but I don't see > >> anything about logging anything in its AER registers. > > > > If the trigger reason is "unmasked uncorrectable error", then the DPC > > AER register has useful information. All other reasons would not provide > > useful data in the AER register. > > > > > > I would assume that aer_get_device_error_info would return false if > > it's one of other reasons, but I can fix this up to check the trigger > > reasoning rather than unconditionally reading the AER status > > > > https://pcisig.com/sites/default/files/specification_documents/ECN_DPC_2012-02-09_finalized.pdf > > If you look at figure 6-2, AER errors are always logged regardless > of the DPC presence. Figure 6-2 applies to "Errors from Table 6-2, 6-3, or 6-4". (I suspect it should also include Table 6-5 because PCIe r4.0, sec 6.2.4 says Table 6-5 corresponds to AER status bits). But in any case, those tables don't include receipt of ERR_FATAL or ERR_NONFATAL messages. So I don't think a DPC Port should be logging anything in its own AER registers when it receives those messages. If a DPC Port receives ERR_FATAL or ERR_NONFATAL while DPC is enabled, it will trigger DPC and drop the message. If DPC were not enabled, I think it would merely forward the message toward the Root Complex without logging an error itself.
On Mon, Jan 15, 2018 at 07:47:41PM -0700, Keith Busch wrote: > On Wed, Jan 10, 2018 at 10:52:23AM -0500, Sinan Kaya wrote: > > Since the AER status is set when we observe DPC event and nobody > > is clearing these we won't observe another DPC event until > > somebody clears these. We can say that we are resetting the > > endpoints as part of the DPC but we are not touching the switch > > downstream port or the root port registers. > > > > Somebody still needs to clear these in addition to printing > > whatever information is available in the AER registers. > > We should clear the downstream port's AER uncorrectable status > register if the trigger reason is of that type, but DPC definitely > does not require we clear it in order to observe a new event. I don't quite understand this. We don't need to clear the AER status in order to observe a new DPC event, but I think we *do* need to clear the AER status in order to log future AER events. Don't we? I think Sinan is saying that if a DPC Port observes its own unmasked uncorrectable error (i.e., not something it learned about by receiving an ERR_* message), it will set a bit in an AER status register. Since we do not set DPC ERR_COR Enable, the DPC Port does not generate an ERR_COR Message, so the AER driver never learns about the error and never clears the AER status register. So we'll decode the AER status (with your current patch), but we don't clear it, so if another error occurs, the AER logging won't work correctly. Maybe we should be setting DPC ERR_COR Enable and letting the AER driver decode and clear the AER info?
On Tue, Jan 16, 2018 at 06:56:00PM -0600, Bjorn Helgaas wrote: > On Mon, Jan 15, 2018 at 07:47:41PM -0700, Keith Busch wrote: > > On Wed, Jan 10, 2018 at 10:52:23AM -0500, Sinan Kaya wrote: > > > Since the AER status is set when we observe DPC event and nobody > > > is clearing these we won't observe another DPC event until > > > somebody clears these. We can say that we are resetting the > > > endpoints as part of the DPC but we are not touching the switch > > > downstream port or the root port registers. > > > > > > Somebody still needs to clear these in addition to printing > > > whatever information is available in the AER registers. > > > > We should clear the downstream port's AER uncorrectable status > > register if the trigger reason is of that type, but DPC definitely > > does not require we clear it in order to observe a new event. > > I don't quite understand this. We don't need to clear the AER status > in order to observe a new DPC event, but I think we *do* need to clear > the AER status in order to log future AER events. Don't we? Absolutely, I totally agree on this. I have a v2 ready to send that calls pci_cleanup_aer_uncorrect_error_status after printing the error. And I've also made sure to do this only for the appropriate trigger reason that you mentioned on the other thread. > I think Sinan is saying that if a DPC Port observes its own unmasked > uncorrectable error (i.e., not something it learned about by receiving > an ERR_* message), it will set a bit in an AER status register. > > Since we do not set DPC ERR_COR Enable, the DPC Port does not generate > an ERR_COR Message, so the AER driver never learns about the error and > never clears the AER status register. So we'll decode the AER status > (with your current patch), but we don't clear it, so if another error > occurs, the AER logging won't work correctly. > > Maybe we should be setting DPC ERR_COR Enable and letting the AER > driver decode and clear the AER info? Yes, enabling ERR_COR should be no problem and will provide those platform notification benefits. That alone will not clear the uncorrectable status, though. The AER driver will only check correctable status registers on that message. The spec doesn't appear to define what we should find in the downstream port's correctable status, so I can't tell if that will actually log new information or if it's simply to assist platform notification that something happened. I'll post the v2 for consideration, and inlcude the ERR_COR enabling. This does clash a bit with Oza's approach, but it should lower churn for the near term.
On 1/16/2018 7:56 PM, Bjorn Helgaas wrote: > I think Sinan is saying that if a DPC Port observes its own unmasked > uncorrectable error (i.e., not something it learned about by receiving > an ERR_* message), it will set a bit in an AER status register. > > Since we do not set DPC ERR_COR Enable, the DPC Port does not generate > an ERR_COR Message, so the AER driver never learns about the error and > never clears the AER status register. So we'll decode the AER status > (with your current patch), but we don't clear it, so if another error > occurs, the AER logging won't work correctly. Yes, this is what I was meaning. I see that Keith took care of this in the new series.
diff --git a/drivers/pci/pcie/pcie-dpc.c b/drivers/pci/pcie/pcie-dpc.c index ef71a472592c..aefc4fbdcef0 100644 --- a/drivers/pci/pcie/pcie-dpc.c +++ b/drivers/pci/pcie/pcie-dpc.c @@ -44,6 +44,7 @@ struct dpc_dev { int cap_pos; bool rp; u32 rp_pio_status; + struct aer_err_info info; }; static const char * const rp_pio_error_string[] = { @@ -111,6 +112,9 @@ static void interrupt_event_handler(struct work_struct *work) struct pci_dev *dev, *temp, *pdev = dpc->dev->port; struct pci_bus *parent = pdev->subordinate; + if (aer_get_device_error_info(pdev, &dpc->info)) + aer_print_error(pdev, &dpc->info); + pci_lock_rescan_remove(); list_for_each_entry_safe_reverse(dev, temp, &parent->devices, bus_list) { @@ -275,6 +279,10 @@ static irqreturn_t dpc_irq(int irq, void *context) /* show RP PIO error detail information */ if (reason == 3 && ext_reason == 0) dpc_process_rp_pio_error(dpc); + if (reason == 2) + dpc->info.severity = AER_FATAL; + else + dpc->info.severity = AER_NONFATAL; schedule_work(&dpc->work); }
A DPC enabled device will suppress sending ERR_FATAL and ERR_NONFATAL, which prevents the AER handler from reporting the details of the error. This patch will have the DPC driver get the AER status registers from the downstream port that detected the uncorrectable error, and print out additional information. Signed-off-by: Keith Busch <keith.busch@intel.com> --- drivers/pci/pcie/pcie-dpc.c | 8 ++++++++ 1 file changed, 8 insertions(+)