Message ID | 20200927082736.14633-5-haifeng.zhao@intel.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | Fix DPC hotplug race and enhance error handling | expand |
On Sun, 2020-09-27 at 04:27 -0400, Ethan Zhao wrote: > When uncorrectable error happens, AER driver and DPC driver interrupt > handlers likely call > > pcie_do_recovery() > ->pci_walk_bus() > ->report_frozen_detected() > > with pci_channel_io_frozen the same time. > If pci_dev_set_io_state() return true even if the original state is > pci_channel_io_frozen, that will cause AER or DPC handler re-enter > the error detecting and recovery procedure one after another. > The result is the recovery flow mixed between AER and DPC. > So simplify the pci_dev_set_io_state() function to only return true > when dev->error_state is changed. > > Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com> > Tested-by: Wen Jin <wen.jin@intel.com> > Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Reviewed-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> > Reviewed-by: Joe Perches <joe@perches.com> Hi Ethan/Haifeng. Like Andy, I did not "review" this patch and sign it. I merely suggested another simplification. Please do not add -by: lines unless actually received by you.
Sorry for that offence, I should ask for your permission. -----Original Message----- From: Joe Perches <joe@perches.com> Sent: Sunday, September 27, 2020 5:14 PM To: Zhao, Haifeng <haifeng.zhao@intel.com>; bhelgaas@google.com; oohall@gmail.com; ruscur@russell.cc; lukas@wunner.de; andriy.shevchenko@linux.intel.com; stuart.w.hayes@gmail.com; mr.nuke.me@gmail.com; mika.westerberg@linux.intel.com Cc: linux-pci@vger.kernel.org; linux-kernel@vger.kernel.org; Jia, Pei P <pei.p.jia@intel.com>; ashok.raj@linux.intel.com; Kuppuswamy, Sathyanarayanan <sathyanarayanan.kuppuswamy@intel.com>; hch@infradead.org Subject: Re: [PATCH 4/5 V4] PCI: only return true when dev io state is really changed On Sun, 2020-09-27 at 04:27 -0400, Ethan Zhao wrote: > When uncorrectable error happens, AER driver and DPC driver interrupt > handlers likely call > > pcie_do_recovery() > ->pci_walk_bus() > ->report_frozen_detected() > > with pci_channel_io_frozen the same time. > If pci_dev_set_io_state() return true even if the original state is > pci_channel_io_frozen, that will cause AER or DPC handler re-enter the > error detecting and recovery procedure one after another. > The result is the recovery flow mixed between AER and DPC. > So simplify the pci_dev_set_io_state() function to only return true > when dev->error_state is changed. > > Signed-off-by: Ethan Zhao <haifeng.zhao@intel.com> > Tested-by: Wen Jin <wen.jin@intel.com> > Tested-by: Shanshan Zhang <ShanshanX.Zhang@intel.com> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Reviewed-by: Alexandru Gagniuc <mr.nuke.me@gmail.com> > Reviewed-by: Joe Perches <joe@perches.com> Hi Ethan/Haifeng. Like Andy, I did not "review" this patch and sign it. I merely suggested another simplification. Please do not add -by: lines unless actually received by you.
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h index fa12f7cbc1a0..a2c1c7d5f494 100644 --- a/drivers/pci/pci.h +++ b/drivers/pci/pci.h @@ -359,39 +359,12 @@ struct pci_sriov { static inline bool pci_dev_set_io_state(struct pci_dev *dev, pci_channel_state_t new) { - bool changed = false; - device_lock_assert(&dev->dev); - switch (new) { - case pci_channel_io_perm_failure: - switch (dev->error_state) { - case pci_channel_io_frozen: - case pci_channel_io_normal: - case pci_channel_io_perm_failure: - changed = true; - break; - } - break; - case pci_channel_io_frozen: - switch (dev->error_state) { - case pci_channel_io_frozen: - case pci_channel_io_normal: - changed = true; - break; - } - break; - case pci_channel_io_normal: - switch (dev->error_state) { - case pci_channel_io_frozen: - case pci_channel_io_normal: - changed = true; - break; - } - break; - } - if (changed) - dev->error_state = new; - return changed; + if (dev->error_state == new) + return false; + + dev->error_state = new; + return true; } static inline int pci_dev_set_disconnected(struct pci_dev *dev, void *unused)