Message ID | 20250207204310.2546091-1-kbusch@meta.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | pci: allow user specifiy a reset wait timeout | expand |
On Fri, Feb 07, 2025 at 12:43:10PM -0800, Keith Busch wrote: > The spec does not provide any upper limit to how long a device may > return Request Retry Status. It just says "Some devices require a > lengthy self-initialization sequence to complete". The kernel > arbitrarily chose 60 seconds since that really ought to be enough. But > there are devices where this turns out not to be enough. > > Since any timeout choice would be arbitrary, and 60 seconds is generally > more than enough for the majority of hardware, let's make this a > parameter so an admin can adjust it specifically to their needs if the > default timeout isn't appropriate. There are d3hot_delay and d3cold_delay members in struct pci_dev. How about adding a reset_delay which can be set in a device-specific quirk? I think I'd prefer that over a command line parameter. A D3cold -> D0 transition implies a reset, but I'm not sure it's appropriate to (ab)use d3cold_delay as a reset_delay. Thanks, Lukas
On Sat, Feb 08, 2025 at 05:50:04AM +0100, Lukas Wunner wrote: > On Fri, Feb 07, 2025 at 12:43:10PM -0800, Keith Busch wrote: > > The spec does not provide any upper limit to how long a device may > > return Request Retry Status. It just says "Some devices require a > > lengthy self-initialization sequence to complete". The kernel > > arbitrarily chose 60 seconds since that really ought to be enough. But > > there are devices where this turns out not to be enough. > > > > Since any timeout choice would be arbitrary, and 60 seconds is generally > > more than enough for the majority of hardware, let's make this a > > parameter so an admin can adjust it specifically to their needs if the > > default timeout isn't appropriate. > > There are d3hot_delay and d3cold_delay members in struct pci_dev. > How about adding a reset_delay which can be set in a device-specific > quirk? I think I'd prefer that over a command line parameter. > > A D3cold -> D0 transition implies a reset, but I'm not sure it's > appropriate to (ab)use d3cold_delay as a reset_delay. My concern with quirking it is that we'd have to settle on what we think is the worst case timeout, then it becomes compiled into that kernel for that device. The devices I'm dealing with are actively under development, and the time to ready gets bigger or smaller as updates occur, or some new worst case scenario is discovered. Making this a boot time decicion really helps with experimentation here.
On Mon, Feb 10, 2025 at 07:59:01AM -0700, Keith Busch wrote: > On Sat, Feb 08, 2025 at 05:50:04AM +0100, Lukas Wunner wrote: > > On Fri, Feb 07, 2025 at 12:43:10PM -0800, Keith Busch wrote: > > > The spec does not provide any upper limit to how long a device may > > > return Request Retry Status. It just says "Some devices require a > > > lengthy self-initialization sequence to complete". The kernel > > > arbitrarily chose 60 seconds since that really ought to be enough. But > > > there are devices where this turns out not to be enough. > > > > > > Since any timeout choice would be arbitrary, and 60 seconds is generally > > > more than enough for the majority of hardware, let's make this a > > > parameter so an admin can adjust it specifically to their needs if the > > > default timeout isn't appropriate. > > > > There are d3hot_delay and d3cold_delay members in struct pci_dev. > > How about adding a reset_delay which can be set in a device-specific > > quirk? I think I'd prefer that over a command line parameter. > > > > A D3cold -> D0 transition implies a reset, but I'm not sure it's > > appropriate to (ab)use d3cold_delay as a reset_delay. > > My concern with quirking it is that we'd have to settle on what we think > is the worst case timeout, then it becomes compiled into that kernel for > that device. The devices I'm dealing with are actively under > development, and the time to ready gets bigger or smaller as updates > occur, or some new worst case scenario is discovered. Making this a boot > time decicion really helps with experimentation here. I understand, but honestly this doesn't sound like something which needs to be in the upstream kernel. If it's for experimentation only, I'd keep it in the downstream kernel used for experimentation and if it turns out that 60 sec is insufficient for the final production device, I'd submit a quirk for that. Thanks, Lukas
On Mon, Feb 10, 2025 at 04:15:54PM +0100, Lukas Wunner wrote: > On Mon, Feb 10, 2025 at 07:59:01AM -0700, Keith Busch wrote: > > My concern with quirking it is that we'd have to settle on what we think > > is the worst case timeout, then it becomes compiled into that kernel for > > that device. The devices I'm dealing with are actively under > > development, and the time to ready gets bigger or smaller as updates > > occur, or some new worst case scenario is discovered. Making this a boot > > time decicion really helps with experimentation here. > > I understand, but honestly this doesn't sound like something which > needs to be in the upstream kernel. If it's for experimentation only, > I'd keep it in the downstream kernel used for experimentation > and if it turns out that 60 sec is insufficient for the final > production device, I'd submit a quirk for that. It's always a pain to carry out of tree patches. These might be devices having active development, but they are used in production and the systems they're in follow the standard kernel updates. And before this generation of devices even settles on an appropriate quirk timeout might require (if that ever happens), I have the next generations to deal with, so this need isn't going to go away. Carrying such an out of tree patch for eternity sounds unpleasant.
On Fri, 7 Feb 2025, Keith Busch wrote: > From: Keith Busch <kbusch@kernel.org> > > The spec does not provide any upper limit to how long a device may > return Request Retry Status. It just says "Some devices require a > lengthy self-initialization sequence to complete". The kernel > arbitrarily chose 60 seconds since that really ought to be enough. But > there are devices where this turns out not to be enough. > > Since any timeout choice would be arbitrary, and 60 seconds is generally > more than enough for the majority of hardware, let's make this a > parameter so an admin can adjust it specifically to their needs if the > default timeout isn't appropriate. > > Signed-off-by: Keith Busch <kbusch@kernel.org> > --- > Documentation/admin-guide/kernel-parameters.txt | 3 +++ > drivers/pci/pci.c | 6 +++++- > 2 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index fb8752b42ec85..1aed555ef8b40 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -4843,6 +4843,9 @@ > > Note: this may remove isolation between devices > and may put more devices in an IOMMU group. > + reset_wait=nn The number of milliseconds to wait after a > + reset while seeing Request Retry Status. > + Default is 60000 (1 minute). > force_floating [S390] Force usage of floating interrupts. > nomio [S390] Do not use MIO instructions. > norid [S390] ignore the RID field and force use of > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index 869d204a70a37..20817dd5ebba7 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -75,7 +75,8 @@ struct pci_pme_device { > * limit, but 60 sec ought to be enough for any device to become > * responsive. > */ > -#define PCIE_RESET_READY_POLL_MS 60000 /* msec */ > +#define PCIE_RESET_READY_POLL_MS pci_reset_ready_wait > +unsigned long pci_reset_ready_wait = 60000; /* msec */ I don't think masking variables with defines like that is a good idea. I also suggest you put the unit as a postfix to the variable name. > static void pci_dev_d3_sleep(struct pci_dev *dev) > { > @@ -6841,6 +6842,9 @@ static int __init pci_setup(char *str) > disable_acs_redir_param = str + 18; > } else if (!strncmp(str, "config_acs=", 11)) { > config_acs_param = str + 11; > + } else if (!strncmp(str, "reset_wait=", 11)) { > + pci_reset_ready_wait = > + simple_strtoul(str + 11, &str, 0); > } else { > pr_err("PCI: Unknown option `%s'\n", str); > } >
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index fb8752b42ec85..1aed555ef8b40 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4843,6 +4843,9 @@ Note: this may remove isolation between devices and may put more devices in an IOMMU group. + reset_wait=nn The number of milliseconds to wait after a + reset while seeing Request Retry Status. + Default is 60000 (1 minute). force_floating [S390] Force usage of floating interrupts. nomio [S390] Do not use MIO instructions. norid [S390] ignore the RID field and force use of diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 869d204a70a37..20817dd5ebba7 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -75,7 +75,8 @@ struct pci_pme_device { * limit, but 60 sec ought to be enough for any device to become * responsive. */ -#define PCIE_RESET_READY_POLL_MS 60000 /* msec */ +#define PCIE_RESET_READY_POLL_MS pci_reset_ready_wait +unsigned long pci_reset_ready_wait = 60000; /* msec */ static void pci_dev_d3_sleep(struct pci_dev *dev) { @@ -6841,6 +6842,9 @@ static int __init pci_setup(char *str) disable_acs_redir_param = str + 18; } else if (!strncmp(str, "config_acs=", 11)) { config_acs_param = str + 11; + } else if (!strncmp(str, "reset_wait=", 11)) { + pci_reset_ready_wait = + simple_strtoul(str + 11, &str, 0); } else { pr_err("PCI: Unknown option `%s'\n", str); }