Message ID | 20200925023423.42675-5-haifeng.zhao@intel.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | Fix DPC hotplug race and enhance error hanlding | expand |
On Thu, Sep 24, 2020 at 10:34:22PM -0400, Ethan Zhao wrote: > When uncorrectable error happens, AER driver and DPC driver interrupt > handlers likely call > pcie_do_recovery()->pci_walk_bus()->report_frozen_detected() with > pci_channel_io_frozen the same time. Call chains are better to read if they split like foo() -> bar() -> baz() > If pci_dev_set_io_state() return true even if the original state is > pci_channel_io_frozen, that will cause AER or DPC handler re-enter > the error detecting and recovery procedure one after another. > The result is the recovery flow mixed between AER and DPC. > So simplify the pci_dev_set_io_state() function to only return true > when dev->error_state is changed. ... > + if (dev->error_state != new) { > dev->error_state = new; > + changed = true; > + } > return changed; Perhaps if (dev->error_state == new) return changed; dev->error_state = new; return true; ?
Hi Ethan, On 9/24/20 9:34 PM, Ethan Zhao wrote: > When uncorrectable error happens, AER driver and DPC driver interrupt > handlers likely call > pcie_do_recovery()->pci_walk_bus()->report_frozen_detected() with > pci_channel_io_frozen the same time. > If pci_dev_set_io_state() return true even if the original state is > pci_channel_io_frozen, that will cause AER or DPC handler re-enter > the error detecting and recovery procedure one after another. > The result is the recovery flow mixed between AER and DPC. > So simplify the pci_dev_set_io_state() function to only return true > when dev->error_state is changed. > > Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com> > Tested-by: Wen jin <wen.jin@intel.com> > Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com> > --- > drivers/pci/pci.h | 31 +++---------------------------- > 1 file changed, 3 insertions(+), 28 deletions(-) > > diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > index fa12f7cbc1a0..d420bb977f3b 100644 > --- a/drivers/pci/pci.h > +++ b/drivers/pci/pci.h > @@ -362,35 +362,10 @@ static inline bool pci_dev_set_io_state(struct pci_dev *dev, > bool changed = false; > > device_lock_assert(&dev->dev); > - switch (new) { > - case pci_channel_io_perm_failure: > - switch (dev->error_state) { > - case pci_channel_io_frozen: > - case pci_channel_io_normal: > - case pci_channel_io_perm_failure: > - changed = true; > - break; > - } > - break; > - case pci_channel_io_frozen: > - switch (dev->error_state) { > - case pci_channel_io_frozen: > - case pci_channel_io_normal: > - changed = true; > - break; > - } > - break; > - case pci_channel_io_normal: > - switch (dev->error_state) { > - case pci_channel_io_frozen: > - case pci_channel_io_normal: > - changed = true; > - break; > - } > - break; > - } > - if (changed) > + if (dev->error_state != new) { > dev->error_state = new; > + changed = true; > + } > return changed; > } The flow is a lot easier to follow now. Thank you. Reviewed-by: Alexandru Gagniuc <mr.nuke.me@gmail.com>
Yes, better ! -----Original Message----- From: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Sent: Friday, September 25, 2020 8:38 PM To: Zhao, Haifeng <haifeng.zhao@intel.com> Cc: bhelgaas@google.com; oohall@gmail.com; ruscur@russell.cc; lukas@wunner.de; stuart.w.hayes@gmail.com; mr.nuke.me@gmail.com; mika.westerberg@linux.intel.com; linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Jia, Pei P <pei.p.jia@intel.com> Subject: Re: [PATCH 4/5] PCI: only return true when dev io state is really changed On Thu, Sep 24, 2020 at 10:34:22PM -0400, Ethan Zhao wrote: > When uncorrectable error happens, AER driver and DPC driver interrupt > handlers likely call > pcie_do_recovery()->pci_walk_bus()->report_frozen_detected() with > pci_channel_io_frozen the same time. Call chains are better to read if they split like foo() -> bar() -> baz() > If pci_dev_set_io_state() return true even if the original state is > pci_channel_io_frozen, that will cause AER or DPC handler re-enter the > error detecting and recovery procedure one after another. > The result is the recovery flow mixed between AER and DPC. > So simplify the pci_dev_set_io_state() function to only return true > when dev->error_state is changed. ... > + if (dev->error_state != new) { > dev->error_state = new; > + changed = true; > + } > return changed; Perhaps if (dev->error_state == new) return changed; dev->error_state = new; return true; ? -- With Best Regards, Andy Shevchenko
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index fa12f7cbc1a0..d420bb977f3b 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -362,35 +362,10 @@ static inline bool pci_dev_set_io_state(struct pci_dev *dev, bool changed = false; device_lock_assert(&dev->dev); - switch (new) { - case pci_channel_io_perm_failure: - switch (dev->error_state) { - case pci_channel_io_frozen: - case pci_channel_io_normal: - case pci_channel_io_perm_failure: - changed = true; - break; - } - break; - case pci_channel_io_frozen: - switch (dev->error_state) { - case pci_channel_io_frozen: - case pci_channel_io_normal: - changed = true; - break; - } - break; - case pci_channel_io_normal: - switch (dev->error_state) { - case pci_channel_io_frozen: - case pci_channel_io_normal: - changed = true; - break; - } - break; - } - if (changed) + if (dev->error_state != new) { dev->error_state = new; + changed = true; + } return changed; }