[v2,4/4] PCI/AER: Dont do recovery when DPC is enabled

Message ID	1510721808-27164-5-git-send-email-poza@codeaurora.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> sender: poza@smtp.codeaurora.org) by smtp.codeaurora.org (Postfix) with ESMTPSA id 1E8BC6086D; Wed, 15 Nov 2017 04:57:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 smtp.codeaurora.org 1E8BC6086D From: Oza Pawandeep <poza@codeaurora.org> To: linux-pci@vger.kernel.org, okaya@codeaurora.org, timur@codeaurora.org Subject: [PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled Date: Wed, 15 Nov 2017 10:26:48 +0530 Message-Id: <1510721808-27164-5-git-send-email-poza@codeaurora.org> In-Reply-To: <1510721808-27164-1-git-send-email-poza@codeaurora.org> References: <1510721808-27164-1-git-send-email-poza@codeaurora.org> Precedence: list Cc: Oza Pawandeep <poza@codeaurora.org>, Gabriele Paoloni <gabriele.paoloni@huawei.com>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Dongdong Liu <liudongdong3@huawei.com>, linux-arm-msm@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>, Thomas Gleixner <tglx@linutronix.de>, linux-arm-kernel@lists.infradead.org MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Oza Pawandeep Nov. 15, 2017, 4:56 a.m. UTC

PCI Express Base Specification, Rev. 4.0 Version 0.9
6.2.10: Downstream Port Containment (DPC)

DPC is an optional normative feature of a Downstream Port. DPC halts PCI
Express traffic below a Downstream Port after an unmasked uncorrectable
error is detected at or below the Port, avoiding the potential spread of
any data corruption, and permitting error recovery if supported by
software

Triggering DPC disables its Link by directing the LTSSM to the Disabled
state. Once the LTSSM reaches the Disabled state, it remains in that
state until the DPC Trigger Status bit is Cleared

So when DPC service is active and registered to port driver, AER should
not attempt to recover, since DPC will be removing downstream devices,
and do the recovery.

Signed-off-by: Oza Pawandeep <poza@codeaurora.org>

Bjorn Helgaas Nov. 15, 2017, 9:14 p.m. UTC | #1

On Wed, Nov 15, 2017 at 10:26:48AM +0530, Oza Pawandeep wrote:
> PCI Express Base Specification, Rev. 4.0 Version 0.9
> 6.2.10: Downstream Port Containment (DPC)
> 
> DPC is an optional normative feature of a Downstream Port. DPC halts PCI
> Express traffic below a Downstream Port after an unmasked uncorrectable
> error is detected at or below the Port, avoiding the potential spread of
> any data corruption, and permitting error recovery if supported by
> software
> 
> Triggering DPC disables its Link by directing the LTSSM to the Disabled
> state. Once the LTSSM reaches the Disabled state, it remains in that
> state until the DPC Trigger Status bit is Cleared
> 
> So when DPC service is active and registered to port driver, AER should
> not attempt to recover, since DPC will be removing downstream devices,
> and do the recovery.
> 
> Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
> 
> diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> index 7448052..a9108ea 100644
> --- a/drivers/pci/pcie/aer/aerdrv_core.c
> +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> @@ -482,6 +482,27 @@ static pci_ers_result_t reset_link(struct pci_dev *dev)
>  }
>  
>  /**
> + * pcie_port_query_uptream_service - query upstream service
> + * @dev: pointer to a pci_dev data structure of agent detecting an error
> + * @service: service to be queried
> + *
> + * Invoked to know the status of the service for pci device.
> + */
> +static bool pcie_port_query_uptream_service(struct pci_dev *dev, u32 service)
> +{
> +	struct pci_dev *upstream_dev = dev;
> +
> +	do {
> +		if (pcie_port_query_service(upstream_dev, service))
> +			return true;
> +		upstream_dev = pcie_port_upstream_bridge(upstream_dev);
> +	} while (upstream_dev);
> +
> +	return false;
> +}
> +
> +
> +/**
>   * do_recovery - handle nonfatal/fatal error recovery process
>   * @dev: pointer to a pci_dev data structure of agent detecting an error
>   * @severity: error severity type
> @@ -495,6 +516,18 @@ static void do_recovery(struct pci_dev *dev, int severity)
>  	pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED;
>  	enum pci_channel_state state;
>  
> +	/*
> +	 * If DPC is enabled, there is no need to attempt recovery.
> +	 * Since DPC disables its Link by directing the LTSSM to
> +	 * the Disabled state.
> +	 * DPC driver will take care of the recovery, there is no need
> +	 * for AER driver to race.
> +	 */
> +	if (pcie_port_query_uptream_service(dev, PCIE_PORT_SERVICE_DPC)) {
> +		dev_info(&dev->dev, "AER: Device recovery to be done by DPC\n");
> +		return;
> +	}

What happens without this test?

Does AER read registers from the now-disabled device and get ~0 data?
Or is AER reading registers from the port upstream from the disabled
device and trying to reset the device?

It looks like get_device_error_info() reads registers and doesn't
check to see whether it gets ~0 back.  I'm wondering if we *should* be
checking there and whether doing that would help mitigate the issue
here.

I don't really like the pcie_port_query_uptream_service() approach
(BTW, the name is misspelled) because it feels a little ad hoc and it
makes assumptions here in the AER code about what the DPC code is
doing, e.g., how it has configured PCI_EXP_DPC_CTL.

>  	if (severity == AER_FATAL)
>  		state = pci_channel_io_frozen;
>  	else
> -- 
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.,
> a Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Sinan Kaya Nov. 16, 2017, 2:03 p.m. UTC | #2

Hi Bjorn,

On 11/15/2017 4:14 PM, Bjorn Helgaas wrote:
>> +	if (pcie_port_query_uptream_service(dev, PCIE_PORT_SERVICE_DPC)) {
>> +		dev_info(&dev->dev, "AER: Device recovery to be done by DPC\n");
>> +		return;
>> +	}
> What happens without this test?
> 
> Does AER read registers from the now-disabled device and get ~0 data?
> Or is AER reading registers from the port upstream from the disabled
> device and trying to reset the device?
> 
> It looks like get_device_error_info() reads registers and doesn't
> check to see whether it gets ~0 back.  I'm wondering if we *should* be
> checking there and whether doing that would help mitigate the issue
> here.

The issue is two independent software entities are trying to recover the PCIe
link simultaneously. AER and DPC have two different approaches to link recovery.

AER makes a callback into the endpoint drivers for non-fatal errors and hope
that endpoint driver can recover the link. AER also makes a callback in the 
fatal error case but resets the link via secondary bus reset.

The DPC on the other hand stops the drivers immediately since HW took care of
link disable. (Endpoint register reads return ~0 at this point.) DPC driver clears
the interrupt from the DPC capability and brings the link up at the end. Full
enumeration/rescan follows this procedure to go back to functioning state. 

If we don't have this AER-DPC coordination, the endpoint driver gets confused since
it receives a stop command as well as a recover command at about the same time
depending on the timing.

Whether the AER driver reads ~0 or not really depends on timing. The link may come
up from the DPC driver by the time AER driver reaches here as an example.

Bad things do happen. We have seen this with e1000e driver.

Sinan

Bjorn Helgaas Nov. 16, 2017, 8:17 p.m. UTC | #3

On Thu, Nov 16, 2017 at 09:03:37AM -0500, Sinan Kaya wrote:
> On 11/15/2017 4:14 PM, Bjorn Helgaas wrote:
> >> +	if (pcie_port_query_uptream_service(dev, PCIE_PORT_SERVICE_DPC)) {
> >> +		dev_info(&dev->dev, "AER: Device recovery to be done by DPC\n");
> >> +		return;
> >> +	}
> > What happens without this test?
> > 
> > Does AER read registers from the now-disabled device and get ~0 data?
> > Or is AER reading registers from the port upstream from the disabled
> > device and trying to reset the device?
> > 
> > It looks like get_device_error_info() reads registers and doesn't
> > check to see whether it gets ~0 back.  I'm wondering if we *should* be
> > checking there and whether doing that would help mitigate the issue
> > here.
> 
> The issue is two independent software entities are trying to recover
> the PCIe link simultaneously. AER and DPC have two different
> approaches to link recovery.
> 
> AER makes a callback into the endpoint drivers for non-fatal errors
> and hope that endpoint driver can recover the link. AER also makes a
> callback in the fatal error case but resets the link via secondary
> bus reset.
> 
> The DPC on the other hand stops the drivers immediately since HW
> took care of link disable. (Endpoint register reads return ~0 at
> this point.) DPC driver clears the interrupt from the DPC capability
> and brings the link up at the end. Full enumeration/rescan follows
> this procedure to go back to functioning state. 
> 
> If we don't have this AER-DPC coordination, the endpoint driver gets
> confused since it receives a stop command as well as a recover
> command at about the same time depending on the timing.
> 
> Whether the AER driver reads ~0 or not really depends on timing. The
> link may come up from the DPC driver by the time AER driver reaches
> here as an example.
> 
> Bad things do happen. We have seen this with e1000e driver.

I don't doubt that bad things happen.  I'm just trying to understand
exactly *what* bad things happen and how, so we can fix them cleanly.

I don't know exactly what you mean by "DPC stops the drivers
immediately".  Since the DPC hardware disables the Link, I *think*
you probably mean that driver accesses to the device start failing
(whether the driver notices this is a whole different question).

When the DPC hardware disables the Link, it causes a hot reset for
downstream components.  The DPC interrupt_event_handler() doesn't do
much except remove the device (which detaches the driver) and clear
the DPC Trigger Status bit (which allows hardware to try to retrain
the Link).

So the "stop" and "recover" commands you mention must be related to
AER.  I guess these would be some of the driver callbacks
(error_detected(), mmio_enabled(), slot_reset(), reset_prepare(),
reset_done(), resume())?

In any case, I agree that it probably doesn't make sense to call any
of these callbacks if the DPC driver has already detached the driver
and re-attached it.  The device state is gone because of the hot reset
and the driver state is gone because of the detach/re-attach.

However, I'm not so sure about the period *before* the DPC driver
detaches the driver.  The description of error_detected() says it
cannot assume the device is accessible, so I think there might be an
argument that AER *should* call this for DPC events so the driver has
a chance to clean up before being unceremoniously detached.

I suspect this all probably requires tighter integration between DPC
and AER, and I'm totally fine with that.  I think the current
separation as separate "drivers" is pretty artificial anyway.

Bjorn

Sinan Kaya Nov. 16, 2017, 8:52 p.m. UTC | #4

>>
>> Whether the AER driver reads ~0 or not really depends on timing. The
>> link may come up from the DPC driver by the time AER driver reaches
>> here as an example.
>>
>> Bad things do happen. We have seen this with e1000e driver.
> 
> I don't doubt that bad things happen.  I'm just trying to understand
> exactly *what* bad things happen and how, so we can fix them cleanly.
> 

This was random crashes in the e1000e drivers accompanied with stack
traces coming from WARN and msi allocation routines.

> I don't know exactly what you mean by "DPC stops the drivers
> immediately".  Since the DPC hardware disables the Link, I *think*
> you probably mean that driver accesses to the device start failing
> (whether the driver notices this is a whole different question).

Sorry for not being clear.

I was talking about the pci_stop_and_remove_bus_device() call in DPC's
interrupt_event_handler() function as the "stop command".

> 
> When the DPC hardware disables the Link, it causes a hot reset for
> downstream components.  The DPC interrupt_event_handler() doesn't do
> much except remove the device (which detaches the driver) and clear
> the DPC Trigger Status bit (which allows hardware to try to retrain
> the Link).
> 

That's true.

> So the "stop" and "recover" commands you mention must be related to
> AER.  

I was talking about pci_stop_and_remove_bus_device() vs. error_detected()
as "stop" and "recover"

> I guess these would be some of the driver callbacks
> (error_detected(), mmio_enabled(), slot_reset(), reset_prepare(),
> reset_done(), resume())?
> 
> In any case, I agree that it probably doesn't make sense to call any
> of these callbacks if the DPC driver has already detached the driver
> and re-attached it.  The device state is gone because of the hot reset
> and the driver state is gone because of the detach/re-attach.
> 
> However, I'm not so sure about the period *before* the DPC driver
> detaches the driver.  The description of error_detected() says it
> cannot assume the device is accessible, so I think there might be an
> argument that AER *should* call this for DPC events so the driver has
> a chance to clean up before being unceremoniously detached.

Makes sense. Let us work on this.

> 
> I suspect this all probably requires tighter integration between DPC
> and AER, and I'm totally fine with that.  I think the current
> separation as separate "drivers" is pretty artificial anyway.

Got it. We will try to plumb DPC error handling into AER driver's error
handling mechanism.

Another question:

What do you think about the rescan following link up? The only entity
that does rescan today is hotplug after DPC recovery. There could be
a platform with DPC support but no hotplug support. 

How should we handle it?

We can call rescan from DPC all the time.

Bjorn Helgaas Nov. 18, 2017, 12:02 a.m. UTC | #5

On Thu, Nov 16, 2017 at 03:52:47PM -0500, Sinan Kaya wrote:
> >>
> >> Whether the AER driver reads ~0 or not really depends on timing. The
> >> link may come up from the DPC driver by the time AER driver reaches
> >> here as an example.
> >>
> >> Bad things do happen. We have seen this with e1000e driver.
> > 
> > I don't doubt that bad things happen.  I'm just trying to understand
> > exactly *what* bad things happen and how, so we can fix them cleanly.
> 
> This was random crashes in the e1000e drivers accompanied with stack
> traces coming from WARN and msi allocation routines.

I didn't look in detail, but I'm not sure there's sufficient locking
in the AER path to make it safe from concurrent device removal.  I
suspect AER could be improved both with respect to handling ~0 data
and this potential concurrency issue.

> > So the "stop" and "recover" commands you mention must be related to
> > AER.  
> 
> I was talking about pci_stop_and_remove_bus_device() vs. error_detected()
> as "stop" and "recover"

Thanks for clearing that up!

> > I suspect this all probably requires tighter integration between DPC
> > and AER, and I'm totally fine with that.  I think the current
> > separation as separate "drivers" is pretty artificial anyway.
> 
> Got it. We will try to plumb DPC error handling into AER driver's error
> handling mechanism.

Looking at the AER code today, I noticed it already uses "DPC" in
another sense.  I don't know what it stands for there (probably
"deferred" something), but I don't think it's "Downstream Port
Containment" :)

> What do you think about the rescan following link up? The only entity
> that does rescan today is hotplug after DPC recovery. There could be
> a platform with DPC support but no hotplug support. 
> 
> How should we handle it?

Good question.  If your system does support both DPC and hotplug, I
assume the link comes back up after you clear DPC Trigger Status.
Does pciehp notice that "link up" event and add the device back?

So I think your question is whether the DPC code should explicitly do
a rescan so that if we don't have pciehp, we'll still automatically
rediscover the device.  I dunno, maybe.  Seems like a plausible idea
anyway.

Bjorn

Sinan Kaya Nov. 19, 2017, 4:41 p.m. UTC | #6

On 11/17/2017 7:02 PM, Bjorn Helgaas wrote:
>> What do you think about the rescan following link up? The only entity
>> that does rescan today is hotplug after DPC recovery. There could be
>> a platform with DPC support but no hotplug support. 
>>
>> How should we handle it?
> Good question.  If your system does support both DPC and hotplug, I
> assume the link comes back up after you clear DPC Trigger Status.
> Does pciehp notice that "link up" event and add the device back?

that's correct. we see a link up indication from hotplug driver followed
by a rescan.

> 
> So I think your question is whether the DPC code should explicitly do
> a rescan so that if we don't have pciehp, we'll still automatically
> rediscover the device.  I dunno, maybe.  Seems like a plausible idea
> anyway.

OK, will add it to DPC driver for completeness.

David Laight Nov. 21, 2017, 4:25 p.m. UTC | #7

From: Sinan Kaya
> Sent: 16 November 2017 14:04
...
> The issue is two independent software entities are trying to recover the PCIe
> link simultaneously. AER and DPC have two different approaches to link recovery.
> 
> AER makes a callback into the endpoint drivers for non-fatal errors and hope
> that endpoint driver can recover the link. AER also makes a callback in the
> fatal error case but resets the link via secondary bus reset.
> 
> The DPC on the other hand stops the drivers immediately since HW took care of
> link disable. (Endpoint register reads return ~0 at this point.)

What happens if the 'user' driver doesn't define the error reporting callbacks?
It might be hardened against the ~0u returns from reads - so not OOPS.
It might be appropriate to call the remove() function instead.

> DPC driver clears
> the interrupt from the DPC capability and brings the link up at the end. Full
> enumeration/rescan follows this procedure to go back to functioning state.

That might not be a good idea, very likely it will fail again immediately.

	David

Sinan Kaya Nov. 21, 2017, 4:43 p.m. UTC | #8

On 11/21/2017 11:25 AM, David Laight wrote:
>> The DPC on the other hand stops the drivers immediately since HW took care of
>> link disable. (Endpoint register reads return ~0 at this point.)
> What happens if the 'user' driver doesn't define the error reporting callbacks?
> It might be hardened against the ~0u returns from reads - so not OOPS.
> It might be appropriate to call the remove() function instead.

This is what the DPC driver does in its interrupt handler. 

http://elixir.free-electrons.com/linux/latest/ident/interrupt_event_handler

My understanding is that this will eventually call the remove() function on the
endpoint driver eventually. 

Bjorn had concerns that we are not calling the error handler if registered and
then calling remove() callback while the driver is in the middle of something
could be bad. 

He had concerns if remove() would leave something in a bad state so recovery
would really not work at all and kernel crashes eventually due to data corruption. 

Oza and I are looking for a way to plumb DPC's error handling into AER driver
so that PCI framework has a single place to look for error handling.

for dpc:
1. If an error handler registered, call it for all children devices
2. Remove all children devices from the bus
3. Recover the link with DPC
4. Rescan the entire bus and install the drivers again

> 
>> DPC driver clears
>> the interrupt from the DPC capability and brings the link up at the end. Full
>> enumeration/rescan follows this procedure to go back to functioning state.
> That might not be a good idea, very likely it will fail again immediately.

We can add a policy parameter and not bring up the link if you want to do
troubleshooting at the point of failure or have a way to define how the
system response should be. 

DPC causes a hot reset on the bus. Endpoint should go to reset state and we should
be able to bring up the link without any problems under normal circumstances.

[v2,4/4] PCI/AER: Dont do recovery when DPC is enabled

Commit Message

Comments

Patch