Message ID | 20171219210643.24615-1-keith.busch@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
s/approrpiate/appropriate/ (But maybe should be restated altogether, see below.) On Tue, Dec 19, 2017 at 02:06:40PM -0700, Keith Busch wrote: > Getting the AER information is documented to return 0 when it failed to > get the information. I think this case is either impossible (if we only call this function for devices known to support AER), or it fixes an actual bug (the caller would call aer_print_error() when it shouldn't, and potentially print garbage). Right? If the former, I vote for removing the test. If the latter, the changelog should mention that it fixes a bug. > Signed-off-by: Keith Busch <keith.busch@intel.com> > --- > drivers/pci/pcie/aer/aerdrv_core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c > index 744805232155..ea0dc1cc7fc7 100644 > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -660,7 +660,7 @@ static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) > > /* The device might not support AER */ > if (!pos) > - return 1; > + return 0; > > if (info->severity == AER_CORRECTABLE) { > pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, > -- > 2.13.6 >
Current code APEI AER and native AER seems a little different. APEI AER ,the device must have AER capability, then can do recovery. Native AER seems the device under the ROOT PORT can do not have the AER capability. Current patch seems can fix the Inconsistency. But as PCIe 4.0r1.0 6.2.5 Sequence of Device Error Signaling and Logging Operations describe,It seems the device can implement the Device Status reg, but do not have AER capability.So for such devices, can we use AER driver to call device error handler callback to do recovery? Thanks, Dongdong 在 2017/12/21 9:04, Bjorn Helgaas 写道: > s/approrpiate/appropriate/ > > (But maybe should be restated altogether, see below.) > > On Tue, Dec 19, 2017 at 02:06:40PM -0700, Keith Busch wrote: >> Getting the AER information is documented to return 0 when it failed to >> get the information. > > I think this case is either impossible (if we only call this function > for devices known to support AER), or it fixes an actual bug (the > caller would call aer_print_error() when it shouldn't, and potentially > print garbage). Right? > > If the former, I vote for removing the test. If the latter, the > changelog should mention that it fixes a bug. > >> Signed-off-by: Keith Busch <keith.busch@intel.com> >> --- >> drivers/pci/pcie/aer/aerdrv_core.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c >> index 744805232155..ea0dc1cc7fc7 100644 >> --- a/drivers/pci/pcie/aer/aerdrv_core.c >> +++ b/drivers/pci/pcie/aer/aerdrv_core.c >> @@ -660,7 +660,7 @@ static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) >> >> /* The device might not support AER */ >> if (!pos) >> - return 1; >> + return 0; >> >> if (info->severity == AER_CORRECTABLE) { >> pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, >> -- >> 2.13.6 >> > > . >
On Wed, Dec 20, 2017 at 05:04:52PM -0800, Bjorn Helgaas wrote: > On Tue, Dec 19, 2017 at 02:06:40PM -0700, Keith Busch wrote: > > Getting the AER information is documented to return 0 when it failed to > > get the information. > > I think this case is either impossible (if we only call this function > for devices known to support AER), or it fixes an actual bug (the > caller would call aer_print_error() when it shouldn't, and potentially > print garbage). Right? > > If the former, I vote for removing the test. If the latter, the > changelog should mention that it fixes a bug. I just spotted this mistake when I was changing the function to non-static. I've never observed bogus data printed in real life. In the current AER handling, the only way we could incorrectly get there is if the root port's Error Source ID Register somehow contains the ID of a device that doesn't have AER capabilities, which would of course be broken. So if we trust hardware, we should are safe to remove the check.
On Tue, Dec 19, 2017 at 02:06:40PM -0700, Keith Busch wrote: > Getting the AER information is documented to return 0 when it failed to > get the information. > > Signed-off-by: Keith Busch <keith.busch@intel.com> Keith, I'm sorry, but I lost track of where we're at with this series. Can you repost a fresh version if it's still applicable? Don't worry about rebasing it to anything other than v4.16-rc1 (my "master" branch). I might have added some conflicting things, but I'll take care of resolving any conflicts. > --- > drivers/pci/pcie/aer/aerdrv_core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c > index 744805232155..ea0dc1cc7fc7 100644 > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -660,7 +660,7 @@ static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) > > /* The device might not support AER */ > if (!pos) > - return 1; > + return 0; > > if (info->severity == AER_CORRECTABLE) { > pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, > -- > 2.13.6 >
On Tue, Mar 20, 2018 at 04:02:01PM -0700, Bjorn Helgaas wrote: > On Tue, Dec 19, 2017 at 02:06:40PM -0700, Keith Busch wrote: > > Getting the AER information is documented to return 0 when it failed to > > get the information. > > > > Signed-off-by: Keith Busch <keith.busch@intel.com> > > Keith, I'm sorry, but I lost track of where we're at with this series. > Can you repost a fresh version if it's still applicable? > > Don't worry about rebasing it to anything other than v4.16-rc1 (my > "master" branch). I might have added some conflicting things, but > I'll take care of resolving any conflicts. No problem, I think this is still applicable, and can be done without conflicting with what Oza and Sinan are working on (it took a few tries, but it finally clicked for me on how that works). I'll rebase a new series and should have it posted by end of week. Thanks, Keith
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index 744805232155..ea0dc1cc7fc7 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -660,7 +660,7 @@ static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) /* The device might not support AER */ if (!pos) - return 1; + return 0; if (info->severity == AER_CORRECTABLE) { pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS,
Getting the AER information is documented to return 0 when it failed to get the information. Signed-off-by: Keith Busch <keith.busch@intel.com> --- drivers/pci/pcie/aer/aerdrv_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)