diff mbox series

pci: allow user specifiy a reset wait timeout

Message ID 20250207204310.2546091-1-kbusch@meta.com (mailing list archive)
State Superseded
Delegated to: Bjorn Helgaas
Headers show
Series pci: allow user specifiy a reset wait timeout | expand

Commit Message

Keith Busch Feb. 7, 2025, 8:43 p.m. UTC
From: Keith Busch <kbusch@kernel.org>

The spec does not provide any upper limit to how long a device may
return Request Retry Status. It just says "Some devices require a
lengthy self-initialization sequence to complete". The kernel
arbitrarily chose 60 seconds since that really ought to be enough. But
there are devices where this turns out not to be enough.

Since any timeout choice would be arbitrary, and 60 seconds is generally
more than enough for the majority of hardware, let's make this a
parameter so an admin can adjust it specifically to their needs if the
default timeout isn't appropriate.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 3 +++
 drivers/pci/pci.c                               | 6 +++++-
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Lukas Wunner Feb. 8, 2025, 4:50 a.m. UTC | #1
On Fri, Feb 07, 2025 at 12:43:10PM -0800, Keith Busch wrote:
> The spec does not provide any upper limit to how long a device may
> return Request Retry Status. It just says "Some devices require a
> lengthy self-initialization sequence to complete". The kernel
> arbitrarily chose 60 seconds since that really ought to be enough. But
> there are devices where this turns out not to be enough.
> 
> Since any timeout choice would be arbitrary, and 60 seconds is generally
> more than enough for the majority of hardware, let's make this a
> parameter so an admin can adjust it specifically to their needs if the
> default timeout isn't appropriate.

There are d3hot_delay and d3cold_delay members in struct pci_dev.
How about adding a reset_delay which can be set in a device-specific
quirk?  I think I'd prefer that over a command line parameter.

A D3cold -> D0 transition implies a reset, but I'm not sure it's
appropriate to (ab)use d3cold_delay as a reset_delay.

Thanks,

Lukas
Keith Busch Feb. 10, 2025, 2:59 p.m. UTC | #2
On Sat, Feb 08, 2025 at 05:50:04AM +0100, Lukas Wunner wrote:
> On Fri, Feb 07, 2025 at 12:43:10PM -0800, Keith Busch wrote:
> > The spec does not provide any upper limit to how long a device may
> > return Request Retry Status. It just says "Some devices require a
> > lengthy self-initialization sequence to complete". The kernel
> > arbitrarily chose 60 seconds since that really ought to be enough. But
> > there are devices where this turns out not to be enough.
> > 
> > Since any timeout choice would be arbitrary, and 60 seconds is generally
> > more than enough for the majority of hardware, let's make this a
> > parameter so an admin can adjust it specifically to their needs if the
> > default timeout isn't appropriate.
> 
> There are d3hot_delay and d3cold_delay members in struct pci_dev.
> How about adding a reset_delay which can be set in a device-specific
> quirk?  I think I'd prefer that over a command line parameter.
> 
> A D3cold -> D0 transition implies a reset, but I'm not sure it's
> appropriate to (ab)use d3cold_delay as a reset_delay.

My concern with quirking it is that we'd have to settle on what we think
is the worst case timeout, then it becomes compiled into that kernel for
that device. The devices I'm dealing with are actively under
development, and the time to ready gets bigger or smaller as updates
occur, or some new worst case scenario is discovered. Making this a boot
time decicion really helps with experimentation here.
Lukas Wunner Feb. 10, 2025, 3:15 p.m. UTC | #3
On Mon, Feb 10, 2025 at 07:59:01AM -0700, Keith Busch wrote:
> On Sat, Feb 08, 2025 at 05:50:04AM +0100, Lukas Wunner wrote:
> > On Fri, Feb 07, 2025 at 12:43:10PM -0800, Keith Busch wrote:
> > > The spec does not provide any upper limit to how long a device may
> > > return Request Retry Status. It just says "Some devices require a
> > > lengthy self-initialization sequence to complete". The kernel
> > > arbitrarily chose 60 seconds since that really ought to be enough. But
> > > there are devices where this turns out not to be enough.
> > > 
> > > Since any timeout choice would be arbitrary, and 60 seconds is generally
> > > more than enough for the majority of hardware, let's make this a
> > > parameter so an admin can adjust it specifically to their needs if the
> > > default timeout isn't appropriate.
> > 
> > There are d3hot_delay and d3cold_delay members in struct pci_dev.
> > How about adding a reset_delay which can be set in a device-specific
> > quirk?  I think I'd prefer that over a command line parameter.
> > 
> > A D3cold -> D0 transition implies a reset, but I'm not sure it's
> > appropriate to (ab)use d3cold_delay as a reset_delay.
> 
> My concern with quirking it is that we'd have to settle on what we think
> is the worst case timeout, then it becomes compiled into that kernel for
> that device. The devices I'm dealing with are actively under
> development, and the time to ready gets bigger or smaller as updates
> occur, or some new worst case scenario is discovered. Making this a boot
> time decicion really helps with experimentation here.

I understand, but honestly this doesn't sound like something which
needs to be in the upstream kernel.  If it's for experimentation only,
I'd keep it in the downstream kernel used for experimentation
and if it turns out that 60 sec is insufficient for the final
production device, I'd submit a quirk for that.

Thanks,

Lukas
Keith Busch Feb. 10, 2025, 3:32 p.m. UTC | #4
On Mon, Feb 10, 2025 at 04:15:54PM +0100, Lukas Wunner wrote:
> On Mon, Feb 10, 2025 at 07:59:01AM -0700, Keith Busch wrote:
> > My concern with quirking it is that we'd have to settle on what we think
> > is the worst case timeout, then it becomes compiled into that kernel for
> > that device. The devices I'm dealing with are actively under
> > development, and the time to ready gets bigger or smaller as updates
> > occur, or some new worst case scenario is discovered. Making this a boot
> > time decicion really helps with experimentation here.
> 
> I understand, but honestly this doesn't sound like something which
> needs to be in the upstream kernel.  If it's for experimentation only,
> I'd keep it in the downstream kernel used for experimentation
> and if it turns out that 60 sec is insufficient for the final
> production device, I'd submit a quirk for that.

It's always a pain to carry out of tree patches. These might be devices
having active development, but they are used in production and the
systems they're in follow the standard kernel updates. And before this
generation of devices even settles on an appropriate quirk timeout might
require (if that ever happens), I have the next generations to deal
with, so this need isn't going to go away. Carrying such an out of tree
patch for eternity sounds unpleasant.
Ilpo Järvinen Feb. 13, 2025, 1:37 p.m. UTC | #5
On Fri, 7 Feb 2025, Keith Busch wrote:

> From: Keith Busch <kbusch@kernel.org>
> 
> The spec does not provide any upper limit to how long a device may
> return Request Retry Status. It just says "Some devices require a
> lengthy self-initialization sequence to complete". The kernel
> arbitrarily chose 60 seconds since that really ought to be enough. But
> there are devices where this turns out not to be enough.
> 
> Since any timeout choice would be arbitrary, and 60 seconds is generally
> more than enough for the majority of hardware, let's make this a
> parameter so an admin can adjust it specifically to their needs if the
> default timeout isn't appropriate.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 3 +++
>  drivers/pci/pci.c                               | 6 +++++-
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index fb8752b42ec85..1aed555ef8b40 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4843,6 +4843,9 @@
>  
>  				Note: this may remove isolation between devices
>  				and may put more devices in an IOMMU group.
> +		reset_wait=nn	The number of milliseconds to wait after a
> +				reset while seeing Request Retry Status.
> +				Default is 60000 (1 minute).
>  		force_floating	[S390] Force usage of floating interrupts.
>  		nomio		[S390] Do not use MIO instructions.
>  		norid		[S390] ignore the RID field and force use of
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 869d204a70a37..20817dd5ebba7 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -75,7 +75,8 @@ struct pci_pme_device {
>   * limit, but 60 sec ought to be enough for any device to become
>   * responsive.
>   */
> -#define PCIE_RESET_READY_POLL_MS 60000 /* msec */
> +#define PCIE_RESET_READY_POLL_MS pci_reset_ready_wait
> +unsigned long pci_reset_ready_wait = 60000; /* msec */

I don't think masking variables with defines like that is a good idea.

I also suggest you put the unit as a postfix to the variable name.

>  static void pci_dev_d3_sleep(struct pci_dev *dev)
>  {
> @@ -6841,6 +6842,9 @@ static int __init pci_setup(char *str)
>  				disable_acs_redir_param = str + 18;
>  			} else if (!strncmp(str, "config_acs=", 11)) {
>  				config_acs_param = str + 11;
> +			} else if (!strncmp(str, "reset_wait=", 11)) {
> +				pci_reset_ready_wait =
> +					simple_strtoul(str + 11, &str, 0);
>  			} else {
>  				pr_err("PCI: Unknown option `%s'\n", str);
>  			}
>
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fb8752b42ec85..1aed555ef8b40 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -4843,6 +4843,9 @@ 
 
 				Note: this may remove isolation between devices
 				and may put more devices in an IOMMU group.
+		reset_wait=nn	The number of milliseconds to wait after a
+				reset while seeing Request Retry Status.
+				Default is 60000 (1 minute).
 		force_floating	[S390] Force usage of floating interrupts.
 		nomio		[S390] Do not use MIO instructions.
 		norid		[S390] ignore the RID field and force use of
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 869d204a70a37..20817dd5ebba7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -75,7 +75,8 @@  struct pci_pme_device {
  * limit, but 60 sec ought to be enough for any device to become
  * responsive.
  */
-#define PCIE_RESET_READY_POLL_MS 60000 /* msec */
+#define PCIE_RESET_READY_POLL_MS pci_reset_ready_wait
+unsigned long pci_reset_ready_wait = 60000; /* msec */
 
 static void pci_dev_d3_sleep(struct pci_dev *dev)
 {
@@ -6841,6 +6842,9 @@  static int __init pci_setup(char *str)
 				disable_acs_redir_param = str + 18;
 			} else if (!strncmp(str, "config_acs=", 11)) {
 				config_acs_param = str + 11;
+			} else if (!strncmp(str, "reset_wait=", 11)) {
+				pci_reset_ready_wait =
+					simple_strtoul(str + 11, &str, 0);
 			} else {
 				pr_err("PCI: Unknown option `%s'\n", str);
 			}