diff mbox series

[v7,4/5] PCI: only return true when dev io state is really changed

Message ID 20201003075514.32935-5-haifeng.zhao@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Bjorn Helgaas
Headers show
Series Fix DPC hotplug race and enhance error handling | expand

Commit Message

Zhao, Haifeng Oct. 3, 2020, 7:55 a.m. UTC
When uncorrectable error happens, AER driver and DPC driver interrupt
handlers likely call

   pcie_do_recovery()
   ->pci_walk_bus()
     ->report_frozen_detected()

with pci_channel_io_frozen the same time.
   If pci_dev_set_io_state() return true even if the original state is
pci_channel_io_frozen, that will cause AER or DPC handler re-enter
the error detecting and recovery procedure one after another.
   The result is the recovery flow mixed between AER and DPC.
So simplify the pci_dev_set_io_state() function to only return true
when dev->error_state is really changed.

Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com>
Tested-by: Wen Jin <wen.jin@intel.com>
Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com>
Reviewed-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
---
Changnes:
 v2: revise description and code according to suggestion from Andy.
 v3: change code to simpler.
 v4: no change.
 v5: no change.
 v6: no change.
 v7: changed based on Bjorn's code and truth table.

 drivers/pci/pci.h | 53 ++++++++++++++++++-----------------------------
 1 file changed, 20 insertions(+), 33 deletions(-)

Comments

Bjorn Helgaas Oct. 3, 2020, 4:44 p.m. UTC | #1
On Sat, Oct 03, 2020 at 03:55:13AM -0400, Ethan Zhao wrote:
> When uncorrectable error happens, AER driver and DPC driver interrupt
> handlers likely call
> 
>    pcie_do_recovery()
>    ->pci_walk_bus()
>      ->report_frozen_detected()
> 
> with pci_channel_io_frozen the same time.
>    If pci_dev_set_io_state() return true even if the original state is
> pci_channel_io_frozen, that will cause AER or DPC handler re-enter
> the error detecting and recovery procedure one after another.
>    The result is the recovery flow mixed between AER and DPC.
> So simplify the pci_dev_set_io_state() function to only return true
> when dev->error_state is really changed.
> 
> Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com>
> Tested-by: Wen Jin <wen.jin@intel.com>
> Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com>
> Reviewed-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> ---
> Changnes:
>  v2: revise description and code according to suggestion from Andy.
>  v3: change code to simpler.
>  v4: no change.
>  v5: no change.
>  v6: no change.
>  v7: changed based on Bjorn's code and truth table.
> 
>  drivers/pci/pci.h | 53 ++++++++++++++++++-----------------------------
>  1 file changed, 20 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 455b32187abd..47af1ff2a286 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -354,44 +354,31 @@ struct pci_sriov {
>   *
>   * Must be called with device_lock held.
>   *
> - * Returns true if state has been changed to the requested state.
> + * Returns true if state has been really changed to the requested state.
>   */
>  static inline bool pci_dev_set_io_state(struct pci_dev *dev,
>  					pci_channel_state_t new)
>  {
> -	bool changed = false;
> -
>  	device_lock_assert(&dev->dev);
> -	switch (new) {
> -	case pci_channel_io_perm_failure:
> -		switch (dev->error_state) {
> -		case pci_channel_io_frozen:
> -		case pci_channel_io_normal:
> -		case pci_channel_io_perm_failure:
> -			changed = true;
> -			break;
> -		}
> -		break;
> -	case pci_channel_io_frozen:
> -		switch (dev->error_state) {
> -		case pci_channel_io_frozen:
> -		case pci_channel_io_normal:
> -			changed = true;
> -			break;
> -		}
> -		break;
> -	case pci_channel_io_normal:
> -		switch (dev->error_state) {
> -		case pci_channel_io_frozen:
> -		case pci_channel_io_normal:
> -			changed = true;
> -			break;
> -		}
> -		break;
> -	}
> -	if (changed)
> -		dev->error_state = new;
> -	return changed;
> +
> +/*
> + *			Truth table:
> + *			requested new state
> + *     current          ------------------------------------------
> + *     state            normal         frozen         perm_failure
> + *     ------------  +  -------------  -------------  ------------
> + *     normal        |  normal         frozen         perm_failure
> + *     frozen        |  normal         frozen         perm_failure
> + *     perm_failure  |  perm_failure*  perm_failure*  perm_failure
> + */
> +
> +	if (dev->error_state == pci_channel_io_perm_failure)
> +		return false;
> +	else if (dev->error_state == new)
> +		return false;
> +
> +	dev->error_state = new;
> +	return true;

No, you missed the point.  I want

  1) One patch that converts the "switch" to the shorter "if"
     statements.  This one will be big and ugly, but should not change
     the functionality at all, and it should be pretty easy to verify
     that since there aren't very many states involved.

     Since this one is pure code simplification, the commit log won't
     say anything at all about AER or DPC or their requirements
     because it's not changing any behavior.

  2) A separate patch that's tiny and makes whatever functional change
     you need.

>  }
>  
>  static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
> -- 
> 2.18.4
>
Ethan Zhao Oct. 7, 2020, 7:50 a.m. UTC | #2
Bjorn,

On Sun, Oct 4, 2020 at 12:44 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> On Sat, Oct 03, 2020 at 03:55:13AM -0400, Ethan Zhao wrote:
> > When uncorrectable error happens, AER driver and DPC driver interrupt
> > handlers likely call
> >
> >    pcie_do_recovery()
> >    ->pci_walk_bus()
> >      ->report_frozen_detected()
> >
> > with pci_channel_io_frozen the same time.
> >    If pci_dev_set_io_state() return true even if the original state is
> > pci_channel_io_frozen, that will cause AER or DPC handler re-enter
> > the error detecting and recovery procedure one after another.
> >    The result is the recovery flow mixed between AER and DPC.
> > So simplify the pci_dev_set_io_state() function to only return true
> > when dev->error_state is really changed.
> >
> > Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com>
> > Tested-by: Wen Jin <wen.jin@intel.com>
> > Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com>
> > Reviewed-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> > Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> > ---
> > Changnes:
> >  v2: revise description and code according to suggestion from Andy.
> >  v3: change code to simpler.
> >  v4: no change.
> >  v5: no change.
> >  v6: no change.
> >  v7: changed based on Bjorn's code and truth table.
> >
> >  drivers/pci/pci.h | 53 ++++++++++++++++++-----------------------------
> >  1 file changed, 20 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> > index 455b32187abd..47af1ff2a286 100644
> > --- a/drivers/pci/pci.h
> > +++ b/drivers/pci/pci.h
> > @@ -354,44 +354,31 @@ struct pci_sriov {
> >   *
> >   * Must be called with device_lock held.
> >   *
> > - * Returns true if state has been changed to the requested state.
> > + * Returns true if state has been really changed to the requested state.
> >   */
> >  static inline bool pci_dev_set_io_state(struct pci_dev *dev,
> >                                       pci_channel_state_t new)
> >  {
> > -     bool changed = false;
> > -
> >       device_lock_assert(&dev->dev);
> > -     switch (new) {
> > -     case pci_channel_io_perm_failure:
> > -             switch (dev->error_state) {
> > -             case pci_channel_io_frozen:
> > -             case pci_channel_io_normal:
> > -             case pci_channel_io_perm_failure:
> > -                     changed = true;
> > -                     break;
> > -             }
> > -             break;
> > -     case pci_channel_io_frozen:
> > -             switch (dev->error_state) {
> > -             case pci_channel_io_frozen:
> > -             case pci_channel_io_normal:
> > -                     changed = true;
> > -                     break;
> > -             }
> > -             break;
> > -     case pci_channel_io_normal:
> > -             switch (dev->error_state) {
> > -             case pci_channel_io_frozen:
> > -             case pci_channel_io_normal:
> > -                     changed = true;
> > -                     break;
> > -             }
> > -             break;
> > -     }
> > -     if (changed)
> > -             dev->error_state = new;
> > -     return changed;
> > +
> > +/*
> > + *                   Truth table:
> > + *                   requested new state
> > + *     current          ------------------------------------------
> > + *     state            normal         frozen         perm_failure
> > + *     ------------  +  -------------  -------------  ------------
> > + *     normal        |  normal         frozen         perm_failure
> > + *     frozen        |  normal         frozen         perm_failure
> > + *     perm_failure  |  perm_failure*  perm_failure*  perm_failure
> > + */
> > +
> > +     if (dev->error_state == pci_channel_io_perm_failure)
> > +             return false;
> > +     else if (dev->error_state == new)
> > +             return false;
> > +
> > +     dev->error_state = new;
> > +     return true;
>
> No, you missed the point.  I want
>
>   1) One patch that converts the "switch" to the shorter "if"
>      statements.  This one will be big and ugly, but should not change
>      the functionality at all, and it should be pretty easy to verify
>      that since there aren't very many states involved.
>
>      Since this one is pure code simplification, the commit log won't
>      say anything at all about AER or DPC or their requirements
>      because it's not changing any behavior.
>
>   2) A separate patch that's tiny and makes whatever functional change
>      you need.

       Make sense, clear,  this time.

     Thanks,
     Ethan
>
> >  }
> >
> >  static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)
> > --
> > 2.18.4
> >
diff mbox series

Patch

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 455b32187abd..47af1ff2a286 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -354,44 +354,31 @@  struct pci_sriov {
  *
  * Must be called with device_lock held.
  *
- * Returns true if state has been changed to the requested state.
+ * Returns true if state has been really changed to the requested state.
  */
 static inline bool pci_dev_set_io_state(struct pci_dev *dev,
 					pci_channel_state_t new)
 {
-	bool changed = false;
-
 	device_lock_assert(&dev->dev);
-	switch (new) {
-	case pci_channel_io_perm_failure:
-		switch (dev->error_state) {
-		case pci_channel_io_frozen:
-		case pci_channel_io_normal:
-		case pci_channel_io_perm_failure:
-			changed = true;
-			break;
-		}
-		break;
-	case pci_channel_io_frozen:
-		switch (dev->error_state) {
-		case pci_channel_io_frozen:
-		case pci_channel_io_normal:
-			changed = true;
-			break;
-		}
-		break;
-	case pci_channel_io_normal:
-		switch (dev->error_state) {
-		case pci_channel_io_frozen:
-		case pci_channel_io_normal:
-			changed = true;
-			break;
-		}
-		break;
-	}
-	if (changed)
-		dev->error_state = new;
-	return changed;
+
+/*
+ *			Truth table:
+ *			requested new state
+ *     current          ------------------------------------------
+ *     state            normal         frozen         perm_failure
+ *     ------------  +  -------------  -------------  ------------
+ *     normal        |  normal         frozen         perm_failure
+ *     frozen        |  normal         frozen         perm_failure
+ *     perm_failure  |  perm_failure*  perm_failure*  perm_failure
+ */
+
+	if (dev->error_state == pci_channel_io_perm_failure)
+		return false;
+	else if (dev->error_state == new)
+		return false;
+
+	dev->error_state = new;
+	return true;
 }
 
 static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)