diff mbox series

[PATCHv2,2/2] pci: warn if a running device is unaware of reset

Message ID 20241025222755.3756162-2-kbusch@meta.com (mailing list archive)
State Accepted
Commit a3151e6daaec171b7d46ac79170ec420ad874cae
Delegated to: Bjorn Helgaas
Headers show
Series [PATCHv2,1/2] pci: provide bus reset attribute | expand

Commit Message

Keith Busch Oct. 25, 2024, 10:27 p.m. UTC
From: Keith Busch <kbusch@kernel.org>

If a reset is issued to a running device with a driver that didn't
register the notification callbacks, the driver may be unaware of this
event and have an inconsistent view of the device's state. Log a warning
of this event because there's nothing else indicating the event occured,
which could be confusing when debugging such situations.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/pci/pci.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Alex Williamson Oct. 28, 2024, 7:35 p.m. UTC | #1
On Fri, 25 Oct 2024 15:27:55 -0700
Keith Busch <kbusch@meta.com> wrote:

> From: Keith Busch <kbusch@kernel.org>
> 
> If a reset is issued to a running device with a driver that didn't
> register the notification callbacks, the driver may be unaware of this
> event and have an inconsistent view of the device's state. Log a warning
> of this event because there's nothing else indicating the event occured,
> which could be confusing when debugging such situations.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/pci/pci.c | 4 ++++
>  1 file changed, 4 insertions(+)

Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Niklas Schnelle Oct. 29, 2024, 11:27 a.m. UTC | #2
On Fri, 2024-10-25 at 15:27 -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> If a reset is issued to a running device with a driver that didn't
> register the notification callbacks, the driver may be unaware of this
> event and have an inconsistent view of the device's state. Log a warning
> of this event because there's nothing else indicating the event occured,
> which could be confusing when debugging such situations.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/pci/pci.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 338dfcd983f1e..bbf12d4998269 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5158,6 +5158,8 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
>  	 */
>  	if (err_handler && err_handler->reset_prepare)
>  		err_handler->reset_prepare(dev);
> +	else if (dev->driver)
> +		pci_warn(dev, "resetting");
>  
>  	/*
>  	 * Wake-up device prior to save.  PM registers default to D0 after
> @@ -5191,6 +5193,8 @@ static void pci_dev_restore(struct pci_dev *dev)
>  	 */
>  	if (err_handler && err_handler->reset_done)
>  		err_handler->reset_done(dev);
> +	else if (dev->driver)
> +		pci_warn(dev, "reset done");
>  }
>  
>  /* dev->reset_methods[] is a 0-terminated list of indices into this array */

For what it's worth on s390 I think the previous proposal of having the
attribute on the pci_bus would have been better in principle. On s390
the PCI bus probing is done by firmware and Linux doesn't see a pci_dev
for bridges but we do create struct pci_bus for example for a PF and
its child VFs.

Then again we can't really do a reset on this level, other than
manually going through the PCI functions and resetting them one by one.
In fact we may see PCI functions on their own bus while another Linux
instance (LPAR) sees other functions from that bus. So yeah, I guess
it's fair not to have this attribute but still wanted to offer these
thoughts.

Thanks,
Niklas
ameynarkhede03 Oct. 29, 2024, 3:06 p.m. UTC | #3
On 24/10/25 03:27pm, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> If a reset is issued to a running device with a driver that didn't
> register the notification callbacks, the driver may be unaware of this
> event and have an inconsistent view of the device's state. Log a warning
> of this event because there's nothing else indicating the event occured,
> which could be confusing when debugging such situations.
>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Amey Narkhede <ameynarkhede03@gmail.com>

> ---
>  drivers/pci/pci.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 338dfcd983f1e..bbf12d4998269 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5158,6 +5158,8 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
>  	 */
>  	if (err_handler && err_handler->reset_prepare)
>  		err_handler->reset_prepare(dev);
> +	else if (dev->driver)
> +		pci_warn(dev, "resetting");
>
>  	/*
>  	 * Wake-up device prior to save.  PM registers default to D0 after
> @@ -5191,6 +5193,8 @@ static void pci_dev_restore(struct pci_dev *dev)
>  	 */
>  	if (err_handler && err_handler->reset_done)
>  		err_handler->reset_done(dev);
> +	else if (dev->driver)
> +		pci_warn(dev, "reset done");
>  }
>
>  /* dev->reset_methods[] is a 0-terminated list of indices into this array */
> --
> 2.43.5
>
Keith Busch Nov. 1, 2024, 7:21 p.m. UTC | #4
On Tue, Oct 29, 2024 at 12:27:41PM +0100, Niklas Schnelle wrote:
> For what it's worth on s390 I think the previous proposal of having the
> attribute on the pci_bus would have been better in principle. On s390
> the PCI bus probing is done by firmware and Linux doesn't see a pci_dev
> for bridges but we do create struct pci_bus for example for a PF and
> its child VFs.
> 
> Then again we can't really do a reset on this level, other than
> manually going through the PCI functions and resetting them one by one.
> In fact we may see PCI functions on their own bus while another Linux
> instance (LPAR) sees other functions from that bus. So yeah, I guess
> it's fair not to have this attribute but still wanted to offer these
> thoughts.

This attribute uses the pci_dev bridge control register. If you don't
have the bridge device, I don't think this would be useful, so I guess
you'd have to fallback to resetting individual functions.

It seems people can navigate /sys/bus/pci/devices/ easier than finding a
pci_bus under /sys/devices/, though, so that's a plus for pci_dev.
Niklas Schnelle Nov. 4, 2024, 9:44 a.m. UTC | #5
On Fri, 2024-11-01 at 13:21 -0600, Keith Busch wrote:
> On Tue, Oct 29, 2024 at 12:27:41PM +0100, Niklas Schnelle wrote:
> > For what it's worth on s390 I think the previous proposal of having the
> > attribute on the pci_bus would have been better in principle. On s390
> > the PCI bus probing is done by firmware and Linux doesn't see a pci_dev
> > for bridges but we do create struct pci_bus for example for a PF and
> > its child VFs.
> > 
> > Then again we can't really do a reset on this level, other than
> > manually going through the PCI functions and resetting them one by one.
> > In fact we may see PCI functions on their own bus while another Linux
> > instance (LPAR) sees other functions from that bus. So yeah, I guess
> > it's fair not to have this attribute but still wanted to offer these
> > thoughts.
> 
> This attribute uses the pci_dev bridge control register. If you don't
> have the bridge device, I don't think this would be useful, so I guess
> you'd have to fallback to resetting individual functions.
> 
> It seems people can navigate /sys/bus/pci/devices/ easier than finding a
> pci_bus under /sys/devices/, though, so that's a plus for pci_dev.
> 

Yes you're right, since the reset via __pci_reset_bus() and then
pci_bridge_secondary_bus_reset() requires the bridge device (unlike
pci_reset_bus()) it makes more sense to have it on the bridge thus also
requiring the bridge device to exist.

I still kind of wish pci_bridge_secondary_bus_reset() and
pcibios_reset_secondary_bus() would take a struct pci_bus and then we
could do the fallback in the latter platform specific code and having
the attribute on the bus would make sense, but I'm not sure it's all
that useful.

One more question though, what would happen with this reset for a bus
with an SR-IOV device with more than 256 VFs i.e. where
pci_iov_virtfn_bus() returns anything other than 0. I'm guessing since
VFs are physically still controlled by the bridge all VFs would be
reset but at the same time virtfn_add_bus() sets the bridge device for
the added bus as NULL so I think it might look odd in sysfs, sadly I
don't have such a device to test with. Still, this might actually be an
argument for having the attribute on the bridge.

Thanks,
Niklas
Keith Busch Nov. 4, 2024, 5:01 p.m. UTC | #6
On Mon, Nov 04, 2024 at 10:44:23AM +0100, Niklas Schnelle wrote:
> 
> One more question though, what would happen with this reset for a bus
> with an SR-IOV device with more than 256 VFs i.e. where
> pci_iov_virtfn_bus() returns anything other than 0. I'm guessing since
> VFs are physically still controlled by the bridge all VFs would be
> reset but at the same time virtfn_add_bus() sets the bridge device for
> the added bus as NULL so I think it might look odd in sysfs, sadly I
> don't have such a device to test with. Still, this might actually be an
> argument for having the attribute on the bridge.

I assume everything is reset at the PCI level.

Are you asking what the kernel does? I don't think it does anything
special with SR-IOV functions. Those pci_dev's aren't attached to the
bridge pci_dev; you have to go through the pci_bus' children instead.
Niklas Schnelle Nov. 5, 2024, 12:28 p.m. UTC | #7
On Mon, 2024-11-04 at 10:01 -0700, Keith Busch wrote:
> On Mon, Nov 04, 2024 at 10:44:23AM +0100, Niklas Schnelle wrote:
> > 
> > One more question though, what would happen with this reset for a bus
> > with an SR-IOV device with more than 256 VFs i.e. where
> > pci_iov_virtfn_bus() returns anything other than 0. I'm guessing since
> > VFs are physically still controlled by the bridge all VFs would be
> > reset but at the same time virtfn_add_bus() sets the bridge device for
> > the added bus as NULL so I think it might look odd in sysfs, sadly I
> > don't have such a device to test with. Still, this might actually be an
> > argument for having the attribute on the bridge.
> 
> I assume everything is reset at the PCI level.
> 
> Are you asking what the kernel does? I don't think it does anything
> special with SR-IOV functions. Those pci_dev's aren't attached to the
> bridge pci_dev; you have to go through the pci_bus' children instead.
> 

I just want to make sure we're okay with the behavior with such VFs as
it seems like the one case where a reset via the bridge affects PCI
functions which aren't attached to the bridge pci_dev otherwise. And
for example as I understand it these would not be covered by the
pci_bus_save_and_disable_locked().

Thanks,
Niklas
Keith Busch Nov. 5, 2024, 3:46 p.m. UTC | #8
On Tue, Nov 05, 2024 at 01:28:39PM +0100, Niklas Schnelle wrote:
> On Mon, 2024-11-04 at 10:01 -0700, Keith Busch wrote:
> > On Mon, Nov 04, 2024 at 10:44:23AM +0100, Niklas Schnelle wrote:
> > > 
> > > One more question though, what would happen with this reset for a bus
> > > with an SR-IOV device with more than 256 VFs i.e. where
> > > pci_iov_virtfn_bus() returns anything other than 0. I'm guessing since
> > > VFs are physically still controlled by the bridge all VFs would be
> > > reset but at the same time virtfn_add_bus() sets the bridge device for
> > > the added bus as NULL so I think it might look odd in sysfs, sadly I
> > > don't have such a device to test with. Still, this might actually be an
> > > argument for having the attribute on the bridge.
> > 
> > I assume everything is reset at the PCI level.
> > 
> > Are you asking what the kernel does? I don't think it does anything
> > special with SR-IOV functions. Those pci_dev's aren't attached to the
> > bridge pci_dev; you have to go through the pci_bus' children instead.
> > 
> 
> I just want to make sure we're okay with the behavior with such VFs as
> it seems like the one case where a reset via the bridge affects PCI
> functions which aren't attached to the bridge pci_dev otherwise. And
> for example as I understand it these would not be covered by the
> pci_bus_save_and_disable_locked().

Well, it's no different than doing FLR to the PF while VFs are
configured, which is existing behavior, so I think it's okay. The PF's
SRIOV state still gets restored to the end of the reset. The VFs
themselves don't get saved and locked, though. But again, this new patch
is following how it already works for other reset methods.
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 338dfcd983f1e..bbf12d4998269 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5158,6 +5158,8 @@  static void pci_dev_save_and_disable(struct pci_dev *dev)
 	 */
 	if (err_handler && err_handler->reset_prepare)
 		err_handler->reset_prepare(dev);
+	else if (dev->driver)
+		pci_warn(dev, "resetting");
 
 	/*
 	 * Wake-up device prior to save.  PM registers default to D0 after
@@ -5191,6 +5193,8 @@  static void pci_dev_restore(struct pci_dev *dev)
 	 */
 	if (err_handler && err_handler->reset_done)
 		err_handler->reset_done(dev);
+	else if (dev->driver)
+		pci_warn(dev, "reset done");
 }
 
 /* dev->reset_methods[] is a 0-terminated list of indices into this array */