Message ID | 20201009025251.2360659-1-hedi.berriche@hpe.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v1,1/1] PCI/ERR: don't clobber status after reset_link() | expand |
On Fri, Oct 09, 2020 at 03:52:51AM +0100, Hedi Berriche wrote: > Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") > changed pcie_do_recovery() so that status is updated with the return > value from reset_link(); this was to fix the problem where we would > wrongly report recovery failure, despite a successful reset_link(), > whenever the initial error status is PCI_ERS_RESULT_DISCONNECT or > PCI_ERS_RESULT_NO_AER_DRIVER. > > Unfortunately this breaks the flow of pcie_do_recovery() as it prevents What is the reference to "this breaks" above? > the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER > or PCI_ERS_RESULT_NEED_RESET from taking place which causes error > recovery to fail. > > Don't clobber status after reset_link() to restore the intended flow in > pcie_do_recovery(). > > Fix the original problem by saving the return value from reset_link() > and use it later on to decide whether error recovery should be deemed > successful in the scenarios where the initial error status is > PCI_ERS_RESULT_{DISCONNECT,NO_AER_DRIVER}. I would rather rephrase the above to make it clear what is being proposed. Since the description seems to talk about the old problem and new solution all mixed up. > > Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") > Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com> > Cc: Russ Anderson <rja@hpe.com> > Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> > Cc: Bjorn Helgaas <bhelgaas@google.com> > Cc: Ashok Raj <ashok.raj@intel.com> > Cc: Keith Busch <keith.busch@intel.com> > Cc: Joerg Roedel <jroedel@suse.com> > > Cc: stable@kernel.org # v5.7+ > --- > drivers/pci/pcie/err.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c > index c543f419d8f9..dbd0b56bd6c1 100644 > --- a/drivers/pci/pcie/err.c > +++ b/drivers/pci/pcie/err.c > @@ -150,7 +150,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_channel_state_t state, > pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) > { > - pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; > + pci_ers_result_t post_reset_status, status = PCI_ERS_RESULT_CAN_RECOVER; why call it post_reset_status? > struct pci_bus *bus; > > /* > @@ -165,8 +165,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_dbg(dev, "broadcast error_detected message\n"); > if (state == pci_channel_io_frozen) { > pci_walk_bus(bus, report_frozen_detected, &status); > - status = reset_link(dev); > - if (status != PCI_ERS_RESULT_RECOVERED) { > + post_reset_status = reset_link(dev); > + if (post_reset_status != PCI_ERS_RESULT_RECOVERED) { > pci_warn(dev, "link reset failed\n"); > goto failed; > } > @@ -174,6 +174,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, > pci_walk_bus(bus, report_normal_detected, &status); > } > > + if ((status == PCI_ERS_RESULT_DISCONNECT || > + status == PCI_ERS_RESULT_NO_AER_DRIVER) && > + post_reset_status == PCI_ERS_RESULT_RECOVERED) { > + /* error recovery succeeded thanks to reset_link() */ > + status = PCI_ERS_RESULT_RECOVERED; > + } > + > if (status == PCI_ERS_RESULT_CAN_RECOVER) { > status = PCI_ERS_RESULT_RECOVERED; > pci_dbg(dev, "broadcast mmio_enabled message\n"); > -- > 2.28.0 >
On Fri, Oct 09, 2020 at 04:46 Raj, Ashok wrote: Hi Ashok, Thanks for looking into this. >On Fri, Oct 09, 2020 at 03:52:51AM +0100, Hedi Berriche wrote: >> Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") >> changed pcie_do_recovery() so that status is updated with the return >> value from reset_link(); this was to fix the problem where we would >> wrongly report recovery failure, despite a successful reset_link(), >> whenever the initial error status is PCI_ERS_RESULT_DISCONNECT or >> PCI_ERS_RESULT_NO_AER_DRIVER. >> >> Unfortunately this breaks the flow of pcie_do_recovery() as it prevents > >What is the reference to "this breaks" above? The code change introduced by commit 6d2c89441571; would "this code change" instead of "this breaks" work better? If not, I can also rephrase the whole paragraph along the following lines: Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") breaks the flow of pcie_do_recovery() as it prevents the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET from taking place which causes error recovery to fail. ... and do away with the first paragraph. >> the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER >> or PCI_ERS_RESULT_NEED_RESET from taking place which causes error >> recovery to fail. >> >> Don't clobber status after reset_link() to restore the intended flow in >> pcie_do_recovery(). >> >> Fix the original problem by saving the return value from reset_link() >> and use it later on to decide whether error recovery should be deemed >> successful in the scenarios where the initial error status is >> PCI_ERS_RESULT_{DISCONNECT,NO_AER_DRIVER}. > >I would rather rephrase the above to make it clear what is being proposed. >Since the description seems to talk about the old problem and new solution >all mixed up. OK; will do that to clarify that what's being proposed here is: 1. fix the regression introduced by commit 6d2c89441571 2. address the problem that commit 6d2c89441571 aimed to fix >> Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") >> Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com> >> Cc: Russ Anderson <rja@hpe.com> >> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> >> Cc: Bjorn Helgaas <bhelgaas@google.com> >> Cc: Ashok Raj <ashok.raj@intel.com> >> Cc: Keith Busch <keith.busch@intel.com> >> Cc: Joerg Roedel <jroedel@suse.com> >> >> Cc: stable@kernel.org # v5.7+ >> --- >> drivers/pci/pcie/err.c | 13 ++++++++++--- >> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >> index c543f419d8f9..dbd0b56bd6c1 100644 >> --- a/drivers/pci/pcie/err.c >> +++ b/drivers/pci/pcie/err.c >> @@ -150,7 +150,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_channel_state_t state, >> pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) >> { >> - pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >> + pci_ers_result_t post_reset_status, status = PCI_ERS_RESULT_CAN_RECOVER; > >why call it post_reset_status? Perhaps post_reset_status is not a great choice; would reset_result or reset_link_result be better? Cheers, Hedi. > >> struct pci_bus *bus; >> >> /* >> @@ -165,8 +165,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_dbg(dev, "broadcast error_detected message\n"); >> if (state == pci_channel_io_frozen) { >> pci_walk_bus(bus, report_frozen_detected, &status); >> - status = reset_link(dev); >> - if (status != PCI_ERS_RESULT_RECOVERED) { >> + post_reset_status = reset_link(dev); >> + if (post_reset_status != PCI_ERS_RESULT_RECOVERED) { >> pci_warn(dev, "link reset failed\n"); >> goto failed; >> } >> @@ -174,6 +174,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >> pci_walk_bus(bus, report_normal_detected, &status); >> } >> >> + if ((status == PCI_ERS_RESULT_DISCONNECT || >> + status == PCI_ERS_RESULT_NO_AER_DRIVER) && >> + post_reset_status == PCI_ERS_RESULT_RECOVERED) { >> + /* error recovery succeeded thanks to reset_link() */ >> + status = PCI_ERS_RESULT_RECOVERED; >> + } >> + >> if (status == PCI_ERS_RESULT_CAN_RECOVER) { >> status = PCI_ERS_RESULT_RECOVERED; >> pci_dbg(dev, "broadcast mmio_enabled message\n"); >> -- >> 2.28.0 >>
On Fri, Oct 09, 2020 at 05:09 Hedi Berriche wrote: >On Fri, Oct 09, 2020 at 04:46 Raj, Ashok wrote: > >Hi Ashok, > >Thanks for looking into this. > >>On Fri, Oct 09, 2020 at 03:52:51AM +0100, Hedi Berriche wrote: >>>Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") >>>changed pcie_do_recovery() so that status is updated with the return >>>value from reset_link(); this was to fix the problem where we would >>>wrongly report recovery failure, despite a successful reset_link(), >>>whenever the initial error status is PCI_ERS_RESULT_DISCONNECT or >>>PCI_ERS_RESULT_NO_AER_DRIVER. >>> >>>Unfortunately this breaks the flow of pcie_do_recovery() as it prevents >> >>What is the reference to "this breaks" above? > >The code change introduced by commit 6d2c89441571; would > > "this code change" instead of "this breaks" > >work better? If not, I can also rephrase the whole paragraph along the following lines: > >Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") breaks the flow >of pcie_do_recovery() as it prevents the actions needed when the initial error is >PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET from taking place which causes >error recovery to fail. > >... and do away with the first paragraph. > >>>the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER >>>or PCI_ERS_RESULT_NEED_RESET from taking place which causes error >>>recovery to fail. >>> >>>Don't clobber status after reset_link() to restore the intended flow in >>>pcie_do_recovery(). >>> >>>Fix the original problem by saving the return value from reset_link() >>>and use it later on to decide whether error recovery should be deemed >>>successful in the scenarios where the initial error status is >>>PCI_ERS_RESULT_{DISCONNECT,NO_AER_DRIVER}. >> >>I would rather rephrase the above to make it clear what is being proposed. >>Since the description seems to talk about the old problem and new solution >>all mixed up. > >OK; will do that to clarify that what's being proposed here is: > > 1. fix the regression introduced by commit 6d2c89441571 > 2. address the problem that commit 6d2c89441571 aimed to fix > >>>Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") >>>Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com> >>>Cc: Russ Anderson <rja@hpe.com> >>>Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> >>>Cc: Bjorn Helgaas <bhelgaas@google.com> >>>Cc: Ashok Raj <ashok.raj@intel.com> >>>Cc: Keith Busch <keith.busch@intel.com> >>>Cc: Joerg Roedel <jroedel@suse.com> >>> >>>Cc: stable@kernel.org # v5.7+ >>>--- >>> drivers/pci/pcie/err.c | 13 ++++++++++--- >>> 1 file changed, 10 insertions(+), 3 deletions(-) >>> >>>diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c >>>index c543f419d8f9..dbd0b56bd6c1 100644 >>>--- a/drivers/pci/pcie/err.c >>>+++ b/drivers/pci/pcie/err.c >>>@@ -150,7 +150,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>> pci_channel_state_t state, >>> pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) >>> { >>>- pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; >>>+ pci_ers_result_t post_reset_status, status = PCI_ERS_RESULT_CAN_RECOVER; >> >>why call it post_reset_status? > >Perhaps post_reset_status is not a great choice; would reset_result or reset_link_result be better? ... or just do this with a boolean instead as I had it in an earlier iteration of the patch before I eventually opted to use an pci_ers_result_t. Cheers, Hedi. > >Cheers, >Hedi. > >> >>> struct pci_bus *bus; >>> >>> /* >>>@@ -165,8 +165,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>> pci_dbg(dev, "broadcast error_detected message\n"); >>> if (state == pci_channel_io_frozen) { >>> pci_walk_bus(bus, report_frozen_detected, &status); >>>- status = reset_link(dev); >>>- if (status != PCI_ERS_RESULT_RECOVERED) { >>>+ post_reset_status = reset_link(dev); >>>+ if (post_reset_status != PCI_ERS_RESULT_RECOVERED) { >>> pci_warn(dev, "link reset failed\n"); >>> goto failed; >>> } >>>@@ -174,6 +174,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, >>> pci_walk_bus(bus, report_normal_detected, &status); >>> } >>> >>>+ if ((status == PCI_ERS_RESULT_DISCONNECT || >>>+ status == PCI_ERS_RESULT_NO_AER_DRIVER) && >>>+ post_reset_status == PCI_ERS_RESULT_RECOVERED) { >>>+ /* error recovery succeeded thanks to reset_link() */ >>>+ status = PCI_ERS_RESULT_RECOVERED; >>>+ } >>>+ >>> if (status == PCI_ERS_RESULT_CAN_RECOVER) { >>> status = PCI_ERS_RESULT_RECOVERED; >>> pci_dbg(dev, "broadcast mmio_enabled message\n"); >>>-- >>>2.28.0 >>> > >-- >Be careful of reading health books, you might die of a misprint. > -- Mark Twain
diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c index c543f419d8f9..dbd0b56bd6c1 100644 --- a/drivers/pci/pcie/err.c +++ b/drivers/pci/pcie/err.c @@ -150,7 +150,7 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_channel_state_t state, pci_ers_result_t (*reset_link)(struct pci_dev *pdev)) { - pci_ers_result_t status = PCI_ERS_RESULT_CAN_RECOVER; + pci_ers_result_t post_reset_status, status = PCI_ERS_RESULT_CAN_RECOVER; struct pci_bus *bus; /* @@ -165,8 +165,8 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, &status); - status = reset_link(dev); - if (status != PCI_ERS_RESULT_RECOVERED) { + post_reset_status = reset_link(dev); + if (post_reset_status != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "link reset failed\n"); goto failed; } @@ -174,6 +174,13 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev, pci_walk_bus(bus, report_normal_detected, &status); } + if ((status == PCI_ERS_RESULT_DISCONNECT || + status == PCI_ERS_RESULT_NO_AER_DRIVER) && + post_reset_status == PCI_ERS_RESULT_RECOVERED) { + /* error recovery succeeded thanks to reset_link() */ + status = PCI_ERS_RESULT_RECOVERED; + } + if (status == PCI_ERS_RESULT_CAN_RECOVER) { status = PCI_ERS_RESULT_RECOVERED; pci_dbg(dev, "broadcast mmio_enabled message\n");
Commit 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") changed pcie_do_recovery() so that status is updated with the return value from reset_link(); this was to fix the problem where we would wrongly report recovery failure, despite a successful reset_link(), whenever the initial error status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER. Unfortunately this breaks the flow of pcie_do_recovery() as it prevents the actions needed when the initial error is PCI_ERS_RESULT_CAN_RECOVER or PCI_ERS_RESULT_NEED_RESET from taking place which causes error recovery to fail. Don't clobber status after reset_link() to restore the intended flow in pcie_do_recovery(). Fix the original problem by saving the return value from reset_link() and use it later on to decide whether error recovery should be deemed successful in the scenarios where the initial error status is PCI_ERS_RESULT_{DISCONNECT,NO_AER_DRIVER}. Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()") Signed-off-by: Hedi Berriche <hedi.berriche@hpe.com> Cc: Russ Anderson <rja@hpe.com> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Joerg Roedel <jroedel@suse.com> Cc: stable@kernel.org # v5.7+ --- drivers/pci/pcie/err.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)