diff mbox

[V2,3/5] PCI: save and restore bus on parent bus reset

Message ID 1474056395-21843-4-git-send-email-okaya@codeaurora.org (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Sinan Kaya Sept. 16, 2016, 8:06 p.m. UTC
Device states on the bus are saved and restored for all bus resets except
the one initiated through pci_dev_reset. Filling the hole.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/pci/pci.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Bjorn Helgaas Sept. 29, 2016, 9:49 p.m. UTC | #1
Hi Sinan,

On Fri, Sep 16, 2016 at 04:06:32PM -0400, Sinan Kaya wrote:
> Device states on the bus are saved and restored for all bus resets except
> the one initiated through pci_dev_reset. Filling the hole.
> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  drivers/pci/pci.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index aab9d51..8aecab1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -51,6 +51,10 @@ static void pci_pme_list_scan(struct work_struct *work);
>  static LIST_HEAD(pci_pme_list);
>  static DEFINE_MUTEX(pci_pme_list_mutex);
>  static DECLARE_DELAYED_WORK(pci_pme_work, pci_pme_list_scan);
> +static void pci_dev_lock(struct pci_dev *dev);
> +static void pci_dev_unlock(struct pci_dev *dev);
> +static void pci_bus_save_and_disable(struct pci_bus *bus);
> +static void pci_bus_restore(struct pci_bus *bus);
>  
>  struct pci_pme_device {
>  	struct list_head list;
> @@ -3888,8 +3892,18 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
>  	if (probe)
>  		return 0;
>  
> +	if (!probe) {
> +		pci_dev_unlock(dev);
> +		pci_bus_save_and_disable(dev->bus);
> +	}
> +
>  	pci_reset_bridge_secondary_bus(dev->bus->self);
>  
> +	if (!probe) {
> +		pci_bus_restore(dev->bus);
> +		pci_dev_lock(dev);
> +	}

This pattern of "unlock, do something, relock" needs some
justification.  In general it's unsafe because the lock is protecting
*something*, and you have to assume that something can change as soon
as you unlock.  Maybe you know it's safe in this situation, and if so,
the explanation of why it's safe is what I'm looking for.

Also, you're now calling pci_reset_bridge_secondary_bus() with the dev
unlocked, where we called it with the dev locked before.  Some (but
worryingly, not all) of the other pci_reset_bridge_secondary_bus()
callers also have the dev locked.  I didn't look long enough to figure
out if there is a strategy there or if these inconsistencies are
latent bugs.

> +
>  	return 0;
>  }
>  
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Sept. 29, 2016, 11:50 p.m. UTC | #2
Hi Bjorn,

On 9/29/2016 5:49 PM, Bjorn Helgaas wrote:
>> +	}
> This pattern of "unlock, do something, relock" needs some
> justification.  In general it's unsafe because the lock is protecting
> *something*, and you have to assume that something can change as soon
> as you unlock.  Maybe you know it's safe in this situation, and if so,
> the explanation of why it's safe is what I'm looking for.

Agreed. 

The problem is that save and restore routines obtain the lock again and
they fails as the lock is already held.

The alternative is to change the dev_locks in save and restore to try_lock
so that it will work if locks were previously obtained or not. 

> 
> Also, you're now calling pci_reset_bridge_secondary_bus() with the dev
> unlocked, where we called it with the dev locked before.  Some (but
> worryingly, not all) of the other pci_reset_bridge_secondary_bus()
> callers also have the dev locked.  I didn't look long enough to figure
> out if there is a strategy there or if these inconsistencies are
> latent bugs.
> 

The goal of this routine is to reset the device not the bridge and the code
will use FLR or others if available. Therefore, it makes perfect sense to
obtain the device lock while doing this. 

The code tries to reset the bus if none of the other resets work. This is
where the problem is. It destroys the context for other devices.

I can fix get rid of this unlock, do something and then lock again business
by rewriting the locks in save and restore.

Sinan
Sinan Kaya Oct. 3, 2016, 3:34 a.m. UTC | #3
On 9/29/2016 7:50 PM, Sinan Kaya wrote:
> Hi Bjorn,
> 
> On 9/29/2016 5:49 PM, Bjorn Helgaas wrote:
>>> +	}
>> This pattern of "unlock, do something, relock" needs some
>> justification.  In general it's unsafe because the lock is protecting
>> *something*, and you have to assume that something can change as soon
>> as you unlock.  Maybe you know it's safe in this situation, and if so,
>> the explanation of why it's safe is what I'm looking for.
> 
> Agreed. 
> 
> The problem is that save and restore routines obtain the lock again and
> they fails as the lock is already held.
> 
> The alternative is to change the dev_locks in save and restore to try_lock
> so that it will work if locks were previously obtained or not. 
> 
>>
>> Also, you're now calling pci_reset_bridge_secondary_bus() with the dev
>> unlocked, where we called it with the dev locked before.  Some (but
>> worryingly, not all) of the other pci_reset_bridge_secondary_bus()
>> callers also have the dev locked.  I didn't look long enough to figure
>> out if there is a strategy there or if these inconsistencies are
>> latent bugs.
>>
> 
> The goal of this routine is to reset the device not the bridge and the code
> will use FLR or others if available. Therefore, it makes perfect sense to
> obtain the device lock while doing this. 
> 
> The code tries to reset the bus if none of the other resets work. This is
> where the problem is. It destroys the context for other devices.
> 
> I can fix get rid of this unlock, do something and then lock again business
> by rewriting the locks in save and restore.
> 
> Sinan
> 

I looked at the code a little bit more. I missed the fact pci_parent_bus_reset
will only work if there is only PCI device on the bus and that device is the
reset requesting device.

I'll drop this patch as well as the infiniband version for the same reason.
diff mbox

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aab9d51..8aecab1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -51,6 +51,10 @@  static void pci_pme_list_scan(struct work_struct *work);
 static LIST_HEAD(pci_pme_list);
 static DEFINE_MUTEX(pci_pme_list_mutex);
 static DECLARE_DELAYED_WORK(pci_pme_work, pci_pme_list_scan);
+static void pci_dev_lock(struct pci_dev *dev);
+static void pci_dev_unlock(struct pci_dev *dev);
+static void pci_bus_save_and_disable(struct pci_bus *bus);
+static void pci_bus_restore(struct pci_bus *bus);
 
 struct pci_pme_device {
 	struct list_head list;
@@ -3888,8 +3892,18 @@  static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
 	if (probe)
 		return 0;
 
+	if (!probe) {
+		pci_dev_unlock(dev);
+		pci_bus_save_and_disable(dev->bus);
+	}
+
 	pci_reset_bridge_secondary_bus(dev->bus->self);
 
+	if (!probe) {
+		pci_bus_restore(dev->bus);
+		pci_dev_lock(dev);
+	}
+
 	return 0;
 }