Message ID | 1490102658-22768-1-git-send-email-caoj.fnst@cn.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
Hi, On Tue, Mar 21, 2017 at 9:24 PM, Cao jin <caoj.fnst@cn.fujitsu.com> wrote: > Include whitespace shooting; correction; typo fix; superfluous word > dropping. > > diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt > index da3b217..0b6bb3e 100644 > --- a/Documentation/PCI/pci-error-recovery.txt > +++ b/Documentation/PCI/pci-error-recovery.txt > > @@ -231,14 +231,14 @@ proceeds to STEP 4 (Slot Reset) > STEP 3: Link Reset > ------------------ > The platform resets the link. This is a PCI-Express specific step > -and is done whenever a non-fatal error has been detected that can be > +and is done whenever a fatal error has been detected that can be > "solved" by resetting the link. First: I thought I saw a patch a few months ago that proposed removing the link rest step. I don't know if the patch was accepted or not. If link resets are still supported, then they can only fix NON-fatal errors: basically, one resets the link, and only the link; one does NOT reset either the device driver, nor the device state. The idea is that after a link reset, communications with the device can immediately resume right where it left off. (this can be hard in practice, if the driver/firmware doesn't know what it was doing when the error occurred. this might be why no one implements it.) Anyway, the whole point of a link reset is that it is explicitly a non-fatal error. --linas
CC MST who touched this file in last commit on it. On 03/22/2017 01:48 PM, Linas Vepstas wrote: > Hi, > > On Tue, Mar 21, 2017 at 9:24 PM, Cao jin <caoj.fnst@cn.fujitsu.com> wrote: >> Include whitespace shooting; correction; typo fix; superfluous word >> dropping. > >> >> diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt >> index da3b217..0b6bb3e 100644 >> --- a/Documentation/PCI/pci-error-recovery.txt >> +++ b/Documentation/PCI/pci-error-recovery.txt >> >> @@ -231,14 +231,14 @@ proceeds to STEP 4 (Slot Reset) >> STEP 3: Link Reset >> ------------------ >> The platform resets the link. This is a PCI-Express specific step >> -and is done whenever a non-fatal error has been detected that can be >> +and is done whenever a fatal error has been detected that can be >> "solved" by resetting the link. > > First: I thought I saw a patch a few months ago that proposed removing > the link rest step. I don't know if the patch was accepted or not. > Yes, I sent this one and I asked to ignore it. At that time, .link_reset handler still exists, now is gone. > If link resets are still supported, then they can only fix NON-fatal errors: > basically, one resets the link, and only the link; one does NOT reset > either the device driver, nor the device state. The idea is that after a link > reset, communications with the device can immediately resume right > where it left off. (this can be hard in practice, if the driver/firmware doesn't > know what it was doing when the error occurred. this might be why no one > implements it.) Anyway, the whole point of a link reset is that it is > explicitly > a non-fatal error. > Perhaps you are still talking about link re-training. After last commit on this file, section "Link Reset" seems only focus on PCI-Express. If it is a PCI-Express specific step, then I think the "fatal" & "non-fatal" referred here are in the PCI-Express(AER) territory, and according to the AER driver(function do_recovery()), platform level link reset targeted to fatal error.
On Tue, 21 Mar 2017 21:24:18 +0800 Cao jin <caoj.fnst@cn.fujitsu.com> wrote: > Include whitespace shooting; correction; typo fix; superfluous word > dropping. > > Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> > --- > This patch was sent in last December, which is not quite suitable at that time, > because link reset is not clear. Now the section "Link Reset" has been cleaned, > submit this patch again. I've gone ahead and applied this to the docs tree, thanks. jon
diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt index da3b217..0b6bb3e 100644 --- a/Documentation/PCI/pci-error-recovery.txt +++ b/Documentation/PCI/pci-error-recovery.txt @@ -11,7 +11,7 @@ Many PCI bus controllers are able to detect a variety of hardware PCI errors on the bus, such as parity errors on the data and address -busses, as well as SERR and PERR errors. Some of the more advanced +buses, as well as SERR and PERR errors. Some of the more advanced chipsets are able to deal with these errors; these include PCI-E chipsets, and the PCI-host bridges found on IBM Power4, Power5 and Power6-based pSeries boxes. A typical action taken is to disconnect the affected device, @@ -173,7 +173,7 @@ is STEP 6 (Permanent Failure). >>> a value of 0xff on read, and writes will be dropped. If more than >>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH >>> assumes that the device driver has gone into an infinite loop ->>> and prints an error to syslog. A reboot is then required to +>>> and prints an error to syslog. A reboot is then required to >>> get the device working again. STEP 2: MMIO Enabled @@ -231,14 +231,14 @@ proceeds to STEP 4 (Slot Reset) STEP 3: Link Reset ------------------ The platform resets the link. This is a PCI-Express specific step -and is done whenever a non-fatal error has been detected that can be +and is done whenever a fatal error has been detected that can be "solved" by resetting the link. STEP 4: Slot Reset ------------------ In response to a return value of PCI_ERS_RESULT_NEED_RESET, the -the platform will perform a slot reset on the requesting PCI device(s). +the platform will perform a slot reset on the requesting PCI device(s). The actual steps taken by a platform to perform a slot reset will be platform-dependent. Upon completion of slot reset, the platform will call the device slot_reset() callback. @@ -258,7 +258,7 @@ configuration registers to initialize to their default conditions. For most PCI devices, a soft reset will be sufficient for recovery. Optional fundamental reset is provided to support a limited number -of PCI Express PCI devices for which a soft reset is not sufficient +of PCI Express devices for which a soft reset is not sufficient for recovery. If the platform supports PCI hotplug, then the reset might be @@ -303,7 +303,7 @@ driver performs device init only from PCI function 0: Same as above. Drivers for PCI Express cards that require a fundamental reset must -set the needs_freset bit in the pci_dev structure in their probe function. +set the needs_freset bit in the pci_dev structure in their probe function. For example, the QLogic qla2xxx driver sets the needs_freset bit for certain PCI card types:
Include whitespace shooting; correction; typo fix; superfluous word dropping. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> --- This patch was sent in last December, which is not quite suitable at that time, because link reset is not clear. Now the section "Link Reset" has been cleaned, submit this patch again. Documentation/PCI/pci-error-recovery.txt | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)