Message ID | 1503940184-29423-1-git-send-email-tbaicar@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote: > Correctable errors do not need any software intervention, so > avoid calling into the software recovery process for correctable > errors. > > Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> > --- > drivers/pci/pcie/aer/aerdrv_core.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c > index b1303b3..4765c11 100644 > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work) > continue; > } > cper_print_aer(pdev, entry.severity, entry.regs); > - do_recovery(pdev, entry.severity); > + if (entry.severity != AER_CORRECTABLE) > + do_recovery(pdev, entry.severity); I think this is fine, and it mirrors what is done in handle_error_source(). But I want to converge the APEI path and the "native" AER path, so as one tiny step in that direction, can you look into doing this test once, e.g., move the test from handle_error_source() into do_recovery(), where one test will handle both paths? > pci_dev_put(pdev); > } > } > -- > Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. > Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project. >
On 10/2/2017 7:19 PM, Bjorn Helgaas wrote: > On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote: >> Correctable errors do not need any software intervention, so >> avoid calling into the software recovery process for correctable >> errors. >> >> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> >> --- >> drivers/pci/pcie/aer/aerdrv_core.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c >> index b1303b3..4765c11 100644 >> --- a/drivers/pci/pcie/aer/aerdrv_core.c >> +++ b/drivers/pci/pcie/aer/aerdrv_core.c >> @@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work) >> continue; >> } >> cper_print_aer(pdev, entry.severity, entry.regs); >> - do_recovery(pdev, entry.severity); >> + if (entry.severity != AER_CORRECTABLE) >> + do_recovery(pdev, entry.severity); > I think this is fine, and it mirrors what is done in > handle_error_source(). > > But I want to converge the APEI path and the "native" AER path, so as > one tiny step in that direction, can you look into doing this test > once, e.g., move the test from handle_error_source() into > do_recovery(), where one test will handle both paths? Hello Bjorn, Thank you for the input! I've looked into this and it seems there is still going to need to be two versions of this check. The native AER path goes through handle_error_source() and for correctable errors requires the write to PCI_ERR_COR_STATUS, but the APEI path does not require this write. I could move this check to the beginning of do_recovery() so it returns right away for correctable errors and remove the else line in handle_error_source() so it always calls into do_recovery(). That doesn't seem like a very clean solution though since then there are still two checks for correctable errors and now we're calling into do_recovery() in both cases for nothing :) Thanks, Tyler
On Wed, Oct 11, 2017 at 10:37:47AM -0400, Tyler Baicar wrote: > On 10/2/2017 7:19 PM, Bjorn Helgaas wrote: > >On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote: > >>Correctable errors do not need any software intervention, so > >>avoid calling into the software recovery process for correctable > >>errors. > >> > >>Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> > >>--- > >> drivers/pci/pcie/aer/aerdrv_core.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >>diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c > >>index b1303b3..4765c11 100644 > >>--- a/drivers/pci/pcie/aer/aerdrv_core.c > >>+++ b/drivers/pci/pcie/aer/aerdrv_core.c > >>@@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work) > >> continue; > >> } > >> cper_print_aer(pdev, entry.severity, entry.regs); > >>- do_recovery(pdev, entry.severity); > >>+ if (entry.severity != AER_CORRECTABLE) > >>+ do_recovery(pdev, entry.severity); > >I think this is fine, and it mirrors what is done in > >handle_error_source(). > > > >But I want to converge the APEI path and the "native" AER path, so as > >one tiny step in that direction, can you look into doing this test > >once, e.g., move the test from handle_error_source() into > >do_recovery(), where one test will handle both paths? > > I've looked into this and it seems there is still going to need to > be two versions of this check. The native AER path goes through > handle_error_source() and for correctable errors requires the write > to PCI_ERR_COR_STATUS, but the APEI path does not require this > write. I could move this check to the beginning of do_recovery() so > it returns right away for correctable errors and remove the else > line in handle_error_source() so it always calls into do_recovery(). > That doesn't seem like a very clean solution though since then there > are still two checks for correctable errors and now we're calling > into do_recovery() in both cases for nothing :) The PCI_ERR_COR_STATUS thing is part of what I see as the problem here. IMHO, the native AER path should collect up the log registers (and do any acknowledgement, e.g., writing PCI_ERR_COR_STATUS) *before* entering the common path. In other words, the Linux code in the native part of AER should be functionally the same as the BIOS code that implements the APEI path. This is a bit of restructuring in the Linux AER code. I haven't looked enough to know how much. If it's impractical, it's impractical. I thought this might be an opportunity for a tiny step in that direction, but if it's not, I guess that's OK. Bjorn
On 10/11/2017 1:09 PM, Bjorn Helgaas wrote: > On Wed, Oct 11, 2017 at 10:37:47AM -0400, Tyler Baicar wrote: >> On 10/2/2017 7:19 PM, Bjorn Helgaas wrote: >>> On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote: >>>> Correctable errors do not need any software intervention, so >>>> avoid calling into the software recovery process for correctable >>>> errors. >>>> >>>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> >>>> --- >>>> drivers/pci/pcie/aer/aerdrv_core.c | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c >>>> index b1303b3..4765c11 100644 >>>> --- a/drivers/pci/pcie/aer/aerdrv_core.c >>>> +++ b/drivers/pci/pcie/aer/aerdrv_core.c >>>> @@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work) >>>> continue; >>>> } >>>> cper_print_aer(pdev, entry.severity, entry.regs); >>>> - do_recovery(pdev, entry.severity); >>>> + if (entry.severity != AER_CORRECTABLE) >>>> + do_recovery(pdev, entry.severity); >>> I think this is fine, and it mirrors what is done in >>> handle_error_source(). >>> >>> But I want to converge the APEI path and the "native" AER path, so as >>> one tiny step in that direction, can you look into doing this test >>> once, e.g., move the test from handle_error_source() into >>> do_recovery(), where one test will handle both paths? >> I've looked into this and it seems there is still going to need to >> be two versions of this check. The native AER path goes through >> handle_error_source() and for correctable errors requires the write >> to PCI_ERR_COR_STATUS, but the APEI path does not require this >> write. I could move this check to the beginning of do_recovery() so >> it returns right away for correctable errors and remove the else >> line in handle_error_source() so it always calls into do_recovery(). >> That doesn't seem like a very clean solution though since then there >> are still two checks for correctable errors and now we're calling >> into do_recovery() in both cases for nothing :) > The PCI_ERR_COR_STATUS thing is part of what I see as the problem > here. IMHO, the native AER path should collect up the log registers > (and do any acknowledgement, e.g., writing PCI_ERR_COR_STATUS) > *before* entering the common path. > > In other words, the Linux code in the native part of AER should be > functionally the same as the BIOS code that implements the APEI path. > > This is a bit of restructuring in the Linux AER code. I haven't > looked enough to know how much. If it's impractical, it's > impractical. I thought this might be an opportunity for a tiny step > in that direction, but if it's not, I guess that's OK. Hello Bjorn, That restructuring doesn't look trivial to do in this patch, so do you think this patch is good for 4.15? Thanks, Tyler
On 10/2/2017 7:19 PM, Bjorn Helgaas wrote: > On Mon, Aug 28, 2017 at 11:09:44AM -0600, Tyler Baicar wrote: >> Correctable errors do not need any software intervention, so >> avoid calling into the software recovery process for correctable >> errors. >> >> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> >> --- >> drivers/pci/pcie/aer/aerdrv_core.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c >> index b1303b3..4765c11 100644 >> --- a/drivers/pci/pcie/aer/aerdrv_core.c >> +++ b/drivers/pci/pcie/aer/aerdrv_core.c >> @@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work) >> continue; >> } >> cper_print_aer(pdev, entry.severity, entry.regs); >> - do_recovery(pdev, entry.severity); >> + if (entry.severity != AER_CORRECTABLE) >> + do_recovery(pdev, entry.severity); > I think this is fine, and it mirrors what is done in > handle_error_source(). > Hello, Will this patch be pulled into 4.15? Thanks, Tyler
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index b1303b3..4765c11 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -626,7 +626,8 @@ static void aer_recover_work_func(struct work_struct *work) continue; } cper_print_aer(pdev, entry.severity, entry.regs); - do_recovery(pdev, entry.severity); + if (entry.severity != AER_CORRECTABLE) + do_recovery(pdev, entry.severity); pci_dev_put(pdev); } }
Correctable errors do not need any software intervention, so avoid calling into the software recovery process for correctable errors. Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> --- drivers/pci/pcie/aer/aerdrv_core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)