Message ID | 20240202071110.8515-3-jhp@endlessos.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v2] PCI: vmd: Enable PCI PM's L1 substates of remapped PCIe Root Port and NVMe | expand |
On Fri, Feb 02, 2024 at 03:11:12PM +0800, Jian-Hong Pan wrote: > The remapped PCIe Root Port and NVMe have PCI PM L1 substates > capability, but they are disabled originally: > > Here is an example on ASUS B1400CEAE: > > Capabilities: [900 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=0us LTR1.2_Threshold=0ns > L1SubCtl2: T_PwrOn=10us > > Power on all of the VMD remapped PCI devices and quirk max snoop LTR > before enable PCI-PM L1 PM Substates by following "Section 5.5.4 of PCIe > Base Spec Revision 6.0". Then, PCI PM's L1 substates control are > initialized & enabled accordingly. > Also, update the comments of > pci_enable_link_state() and pci_enable_link_state_locked() for > kernel-doc, too. The aspm.c changes should be in a separate patch. Presumably the aspm.c code change fixes a defect, and that defect can be described in that patch. That fix may be needed for non-VMD situations, and having it in this vmd patch means it won't be as easy to find and backport. Nit: rewrap commit log to fill 75 columns. > @@ -751,11 +751,9 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) > if (!(features & VMD_FEAT_BIOS_PM_QUIRK)) > return 0; > > - pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); > - > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); > if (!pos) > - return 0; > + goto out_enable_link_state; > > /* > * Skip if the max snoop LTR is non-zero, indicating BIOS has set it > @@ -763,7 +761,7 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) > */ > pci_read_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, <r_reg); > if (!!(ltr_reg & (PCI_LTR_VALUE_MASK | PCI_LTR_SCALE_MASK))) > - return 0; > + goto out_enable_link_state; > > /* > * Set the default values to the maximum required by the platform to > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) > pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, ltr_reg); > pci_info(pdev, "VMD: Default LTR value set by driver\n"); You're not changing this part, and I don't understand exactly how LTR works, but it makes me a little bit queasy to read "set the LTR value to the maximum required to allow the deepest power management savings" and then we set the max snoop values to a fixed constant. I don't think the goal is to "allow the deepest power savings"; I think it's to enable L1.2 *when the device has enough buffering to absorb L1.2 entry/exit latencies*. The spec (PCIe r6.0, sec 7.8.2.2) says "Software should set this to the platform's maximum supported latency or less," so it seems like that value must be platform-dependent, not fixed. And I assume the "_DSM for Latency Tolerance Reporting" is part of the way to get those platform-dependent values, but Linux doesn't actually use that yet. I assume that setting the max values incorrectly may lead to either being too conservative, so we don't use L1.2 when we could, or too aggressive, so we use L1.2 when we shouldn't, and the device loses data because it doesn't have enough internal buffering to absorb the entry/exit delays. This paper has a lot of background and might help answer some of my questions: https://www.intel.co.za/content/dam/doc/white-paper/energy-efficient-platforms-white-paper.pdf > +out_enable_link_state: > + /* > + * Make PCI devices at D0 when enable PCI-PM L1 PM Substates from > + * Section 5.5.4 of PCIe Base Spec Revision 6.0 > + */ > + pci_set_power_state_locked(pdev, PCI_D0); > + pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); Hmmm. PCIE_LINK_STATE_ALL includes ASPM L1.2. We may do this even if the device doesn't have an LTR Capability. ASPM L1.2 cannot work without LTR. I only took a quick look but was not convinced that pci_enable_link_state() does the right thing when we enable PCIE_LINK_STATE_ALL (including ASPM L1.2) on a device that doesn't have LTR. I think it *should* decline to set PCI_L1SS_CTL1_ASPM_L1_2, but I'm not sure it does. Can you double check that path? Maybe that's another defect in aspm.c. > @@ -1164,6 +1164,8 @@ static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked) > link->aspm_default |= ASPM_STATE_L1_1_PCIPM | ASPM_STATE_L1; > if (state & PCIE_LINK_STATE_L1_2_PCIPM) > link->aspm_default |= ASPM_STATE_L1_2_PCIPM | ASPM_STATE_L1; > + if (state & ASPM_STATE_L1_2_MASK) > + aspm_l1ss_init(link); > pcie_config_aspm_link(link, policy_to_aspm_state(link)); > > link->clkpm_default = (state & PCIE_LINK_STATE_CLKPM) ? 1 : 0;
On Fri, 2024-02-02 at 18:05 -0600, Bjorn Helgaas wrote: > On Fri, Feb 02, 2024 at 03:11:12PM +0800, Jian-Hong Pan wrote: > > The remapped PCIe Root Port and NVMe have PCI PM L1 substates > > capability, but they are disabled originally: > > > > Here is an example on ASUS B1400CEAE: > > > > Capabilities: [900 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > L1_PM_Substates+ > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=0ns > > L1SubCtl2: T_PwrOn=10us > > > > Power on all of the VMD remapped PCI devices and quirk max snoop LTR > > before enable PCI-PM L1 PM Substates by following "Section 5.5.4 of PCIe > > Base Spec Revision 6.0". Then, PCI PM's L1 substates control are > > initialized & enabled accordingly. > > > Also, update the comments of > > pci_enable_link_state() and pci_enable_link_state_locked() for > > kernel-doc, too. > > The aspm.c changes should be in a separate patch. Presumably the > aspm.c code change fixes a defect, and that defect can be described in > that patch. That fix may be needed for non-VMD situations, and having > it in this vmd patch means it won't be as easy to find and backport. > > Nit: rewrap commit log to fill 75 columns. > > > @@ -751,11 +751,9 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, > > void *userdata) > > if (!(features & VMD_FEAT_BIOS_PM_QUIRK)) > > return 0; > > > > - pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); > > - > > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); > > if (!pos) > > - return 0; > > + goto out_enable_link_state; > > > > /* > > * Skip if the max snoop LTR is non-zero, indicating BIOS has set > > it > > @@ -763,7 +761,7 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, > > void *userdata) > > */ > > pci_read_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, <r_reg); > > if (!!(ltr_reg & (PCI_LTR_VALUE_MASK | PCI_LTR_SCALE_MASK))) > > - return 0; > > + goto out_enable_link_state; > > > > /* > > * Set the default values to the maximum required by the platform > > to > > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, > > void *userdata) > > pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, ltr_reg); > > pci_info(pdev, "VMD: Default LTR value set by driver\n"); > > You're not changing this part, and I don't understand exactly how LTR > works, but it makes me a little bit queasy to read "set the LTR value > to the maximum required to allow the deepest power management > savings" and then we set the max snoop values to a fixed constant. > > I don't think the goal is to "allow the deepest power savings"; I > think it's to enable L1.2 *when the device has enough buffering to > absorb L1.2 entry/exit latencies*. > > The spec (PCIe r6.0, sec 7.8.2.2) says "Software should set this to > the platform's maximum supported latency or less," so it seems like > that value must be platform-dependent, not fixed. > > And I assume the "_DSM for Latency Tolerance Reporting" is part of the > way to get those platform-dependent values, but Linux doesn't actually > use that yet. This may indeed be the best way but we need to double check with our BIOS folks. AFAIK BIOS writes the LTR values directly so there hasn't been a need to use this _DSM. But under VMD the ports are hidden from BIOS which is why we added it here. I've brought up the question internally to find out how Windows handles the DSM and to get a recommendation from our firmware leads. > > I assume that setting the max values incorrectly may lead to either > being too conservative, so we don't use L1.2 when we could, or too > aggressive, so we use L1.2 when we shouldn't, and the device loses > data because it doesn't have enough internal buffering to absorb the > entry/exit delays. > > This paper has a lot of background and might help answer some of my > questions: > https://www.intel.co.za/content/dam/doc/white-paper/energy-efficient-platforms-white-paper.pdf > > > +out_enable_link_state: > > + /* > > + * Make PCI devices at D0 when enable PCI-PM L1 PM Substates from > > + * Section 5.5.4 of PCIe Base Spec Revision 6.0 > > + */ > > + pci_set_power_state_locked(pdev, PCI_D0); > > + pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); > > Hmmm. PCIE_LINK_STATE_ALL includes ASPM L1.2. We may do this even if > the device doesn't have an LTR Capability. ASPM L1.2 cannot work > without LTR. > > I only took a quick look but was not convinced that > pci_enable_link_state() does the right thing when we enable > PCIE_LINK_STATE_ALL (including ASPM L1.2) on a device that doesn't > have LTR. I think it *should* decline to set PCI_L1SS_CTL1_ASPM_L1_2, > but I'm not sure it does. Can you double check that path? Maybe > that's another defect in aspm.c. It doesn't currently decline. The same scenario happens when the user selects powersupersave. If a device advertises L1.2 with the capable registers set, it should also have the LTR register present. But it doesn't hurt to check. David > > > @@ -1164,6 +1164,8 @@ static int __pci_enable_link_state(struct pci_dev > > *pdev, int state, bool locked) > > link->aspm_default |= ASPM_STATE_L1_1_PCIPM | > > ASPM_STATE_L1; > > if (state & PCIE_LINK_STATE_L1_2_PCIPM) > > link->aspm_default |= ASPM_STATE_L1_2_PCIPM | > > ASPM_STATE_L1; > > + if (state & ASPM_STATE_L1_2_MASK) > > + aspm_l1ss_init(link); > > pcie_config_aspm_link(link, policy_to_aspm_state(link)); > > > > link->clkpm_default = (state & PCIE_LINK_STATE_CLKPM) ? 1 : 0;
On Mon, Feb 05, 2024 at 11:37:16AM -0800, David E. Box wrote: > On Fri, 2024-02-02 at 18:05 -0600, Bjorn Helgaas wrote: > > On Fri, Feb 02, 2024 at 03:11:12PM +0800, Jian-Hong Pan wrote: > ... > > > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, > > > void *userdata) > > > pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, ltr_reg); > > > pci_info(pdev, "VMD: Default LTR value set by driver\n"); > > > > You're not changing this part, and I don't understand exactly how LTR > > works, but it makes me a little bit queasy to read "set the LTR value > > to the maximum required to allow the deepest power management > > savings" and then we set the max snoop values to a fixed constant. > > > > I don't think the goal is to "allow the deepest power savings"; I > > think it's to enable L1.2 *when the device has enough buffering to > > absorb L1.2 entry/exit latencies*. > > > > The spec (PCIe r6.0, sec 7.8.2.2) says "Software should set this to > > the platform's maximum supported latency or less," so it seems like > > that value must be platform-dependent, not fixed. > > > > And I assume the "_DSM for Latency Tolerance Reporting" is part of the > > way to get those platform-dependent values, but Linux doesn't actually > > use that yet. > > This may indeed be the best way but we need to double check with our > BIOS folks. AFAIK BIOS writes the LTR values directly so there > hasn't been a need to use this _DSM. But under VMD the ports are > hidden from BIOS which is why we added it here. I've brought up the > question internally to find out how Windows handles the DSM and to > get a recommendation from our firmware leads. We want Linux to be able to program LTR itself, don't we? We shouldn't have to rely on firmware to do it. If Linux can't do it, hot-added devices aren't going to be able to use L1.2, right? Bjorn
On Mon, 2024-02-05 at 16:42 -0600, Bjorn Helgaas wrote: > On Mon, Feb 05, 2024 at 11:37:16AM -0800, David E. Box wrote: > > On Fri, 2024-02-02 at 18:05 -0600, Bjorn Helgaas wrote: > > > On Fri, Feb 02, 2024 at 03:11:12PM +0800, Jian-Hong Pan wrote: > > ... > > > > > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev > > > > *pdev, > > > > void *userdata) > > > > pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, > > > > ltr_reg); > > > > pci_info(pdev, "VMD: Default LTR value set by driver\n"); > > > > > > You're not changing this part, and I don't understand exactly how LTR > > > works, but it makes me a little bit queasy to read "set the LTR value > > > to the maximum required to allow the deepest power management > > > savings" and then we set the max snoop values to a fixed constant. > > > > > > I don't think the goal is to "allow the deepest power savings"; I > > > think it's to enable L1.2 *when the device has enough buffering to > > > absorb L1.2 entry/exit latencies*. > > > > > > The spec (PCIe r6.0, sec 7.8.2.2) says "Software should set this to > > > the platform's maximum supported latency or less," so it seems like > > > that value must be platform-dependent, not fixed. > > > > > > And I assume the "_DSM for Latency Tolerance Reporting" is part of the > > > way to get those platform-dependent values, but Linux doesn't actually > > > use that yet. > > > > This may indeed be the best way but we need to double check with our > > BIOS folks. AFAIK BIOS writes the LTR values directly so there > > hasn't been a need to use this _DSM. But under VMD the ports are > > hidden from BIOS which is why we added it here. I've brought up the > > question internally to find out how Windows handles the DSM and to > > get a recommendation from our firmware leads. > > We want Linux to be able to program LTR itself, don't we? We > shouldn't have to rely on firmware to do it. If Linux can't do > it, hot-added devices aren't going to be able to use L1.2, right? Agreed. We just want to make sure we are not conflicting with what BIOS may be doing. David
On Fri, 2 Feb 2024, Jian-Hong Pan wrote: > The remapped PCIe Root Port and NVMe have PCI PM L1 substates > capability, but they are disabled originally: > > Here is an example on ASUS B1400CEAE: > > Capabilities: [900 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=0us LTR1.2_Threshold=0ns > L1SubCtl2: T_PwrOn=10us > > Power on all of the VMD remapped PCI devices and quirk max snoop LTR > before enable PCI-PM L1 PM Substates by following "Section 5.5.4 of PCIe > Base Spec Revision 6.0". Then, PCI PM's L1 substates control are > initialized & enabled accordingly. Also, update the comments of > pci_enable_link_state() and pci_enable_link_state_locked() for > kernel-doc, too. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > --- > v2: > - Power on the VMD remapped devices with pci_set_power_state_locked() > - Prepare the PCIe LTR parameters before enable L1 Substates > - Add note into the comments of both pci_enable_link_state() and > pci_enable_link_state_locked() for kernel-doc. > - The original patch set can be split as individual patches. > > drivers/pci/controller/vmd.c | 15 ++++++++++----- > drivers/pci/pcie/aspm.c | 10 ++++++++++ > 2 files changed, 20 insertions(+), 5 deletions(-) > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > index 87b7856f375a..66e47a0dbf1a 100644 > --- a/drivers/pci/controller/vmd.c > +++ b/drivers/pci/controller/vmd.c > @@ -751,11 +751,9 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) > if (!(features & VMD_FEAT_BIOS_PM_QUIRK)) > return 0; > > - pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); > - > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); > if (!pos) > - return 0; > + goto out_enable_link_state; > > /* > * Skip if the max snoop LTR is non-zero, indicating BIOS has set it > @@ -763,7 +761,7 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) > */ > pci_read_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, <r_reg); > if (!!(ltr_reg & (PCI_LTR_VALUE_MASK | PCI_LTR_SCALE_MASK))) > - return 0; > + goto out_enable_link_state; > > /* > * Set the default values to the maximum required by the platform to > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) > pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, ltr_reg); > pci_info(pdev, "VMD: Default LTR value set by driver\n"); > > +out_enable_link_state: > + /* > + * Make PCI devices at D0 when enable PCI-PM L1 PM Substates from > + * Section 5.5.4 of PCIe Base Spec Revision 6.0 I don't understand what are you trying to say here? Are there some typos or grammar errors or something entire missing from the comment? > + */ > + pci_set_power_state_locked(pdev, PCI_D0); > + pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); > + > return 0; > } > > @@ -926,7 +932,6 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) > dev_get_msi_domain(&vmd->dev->dev)); > > vmd_acpi_begin(); > - > pci_scan_child_bus(vmd->bus); > vmd_domain_reset(vmd); Spurious newline change.
On Tue, Feb 06, 2024 at 02:29:12PM +0200, Ilpo Järvinen wrote: > On Fri, 2 Feb 2024, Jian-Hong Pan wrote: > ... > > +out_enable_link_state: > > + /* > > + * Make PCI devices at D0 when enable PCI-PM L1 PM Substates from > > + * Section 5.5.4 of PCIe Base Spec Revision 6.0 > > I don't understand what are you trying to say here? Are there some typos > or grammar errors or something entire missing from the comment? This is about the fact that per sec 5.5.4, "If setting either or both of the enable bits for PCI-PM L1 PM Substates, both ports must be configured as described in this section while in D0." We can wordsmith this a little, maybe: Ensure devices are in D0 before enabling PCI-PM L1 PM Substates, per PCIe r6.0, sec 5.5.4. I look a little askance at having to do this separately from pci_enable_link_state_locked(), but we can solve that elsewhere if need be. > > + pci_set_power_state_locked(pdev, PCI_D0); > > + pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL);
Adding Puranjay On Mon, 2024-02-05 at 15:05 -0800, David E. Box wrote: > On Mon, 2024-02-05 at 16:42 -0600, Bjorn Helgaas wrote: > > On Mon, Feb 05, 2024 at 11:37:16AM -0800, David E. Box wrote: > > > On Fri, 2024-02-02 at 18:05 -0600, Bjorn Helgaas wrote: > > > > On Fri, Feb 02, 2024 at 03:11:12PM +0800, Jian-Hong Pan wrote: > > > ... > > > > > > > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev > > > > > *pdev, > > > > > void *userdata) > > > > > pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, > > > > > ltr_reg); > > > > > pci_info(pdev, "VMD: Default LTR value set by driver\n"); > > > > > > > > You're not changing this part, and I don't understand exactly how LTR > > > > works, but it makes me a little bit queasy to read "set the LTR value > > > > to the maximum required to allow the deepest power management > > > > savings" and then we set the max snoop values to a fixed constant. > > > > > > > > I don't think the goal is to "allow the deepest power savings"; I > > > > think it's to enable L1.2 *when the device has enough buffering to > > > > absorb L1.2 entry/exit latencies*. > > > > > > > > The spec (PCIe r6.0, sec 7.8.2.2) says "Software should set this to > > > > the platform's maximum supported latency or less," so it seems like > > > > that value must be platform-dependent, not fixed. > > > > > > > > And I assume the "_DSM for Latency Tolerance Reporting" is part of the > > > > way to get those platform-dependent values, but Linux doesn't actually > > > > use that yet. > > > > > > This may indeed be the best way but we need to double check with our > > > BIOS folks. AFAIK BIOS writes the LTR values directly so there > > > hasn't been a need to use this _DSM. But under VMD the ports are > > > hidden from BIOS which is why we added it here. I've brought up the > > > question internally to find out how Windows handles the DSM and to > > > get a recommendation from our firmware leads. > > > > We want Linux to be able to program LTR itself, don't we? We > > shouldn't have to rely on firmware to do it. If Linux can't do > > it, hot-added devices aren't going to be able to use L1.2, right? > > Agreed. We just want to make sure we are not conflicting with what BIOS may be > doing. So the feedback is to run the _DSM and just overwrite any BIOS values. Looking up the _DSM I saw there was an attempt to upstream this 4 years ago [1]. I'm not sure why the effort stalled but we can pick up this work again. https://patchwork.kernel.org/project/linux-pci/patch/20201015080311.7811-1-puranjay12@gmail.com/
On Tue, Feb 06, 2024 at 01:25:29PM -0800, David E. Box wrote: > On Mon, 2024-02-05 at 15:05 -0800, David E. Box wrote: > > On Mon, 2024-02-05 at 16:42 -0600, Bjorn Helgaas wrote: > > > On Mon, Feb 05, 2024 at 11:37:16AM -0800, David E. Box wrote: > > > > On Fri, 2024-02-02 at 18:05 -0600, Bjorn Helgaas wrote: > > > > > On Fri, Feb 02, 2024 at 03:11:12PM +0800, Jian-Hong Pan wrote: > > > > ... > > > > > > > > > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev > > > > > > *pdev, > > > > > > void *userdata) > > > > > > pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, > > > > > > ltr_reg); > > > > > > pci_info(pdev, "VMD: Default LTR value set by driver\n"); > > > > > > > > > > You're not changing this part, and I don't understand exactly how LTR > > > > > works, but it makes me a little bit queasy to read "set the LTR value > > > > > to the maximum required to allow the deepest power management > > > > > savings" and then we set the max snoop values to a fixed constant. > > > > > > > > > > I don't think the goal is to "allow the deepest power savings"; I > > > > > think it's to enable L1.2 *when the device has enough buffering to > > > > > absorb L1.2 entry/exit latencies*. > > > > > > > > > > The spec (PCIe r6.0, sec 7.8.2.2) says "Software should set this to > > > > > the platform's maximum supported latency or less," so it seems like > > > > > that value must be platform-dependent, not fixed. > > > > > > > > > > And I assume the "_DSM for Latency Tolerance Reporting" is part of the > > > > > way to get those platform-dependent values, but Linux doesn't actually > > > > > use that yet. > > > > > > > > This may indeed be the best way but we need to double check with our > > > > BIOS folks. AFAIK BIOS writes the LTR values directly so there > > > > hasn't been a need to use this _DSM. But under VMD the ports are > > > > hidden from BIOS which is why we added it here. I've brought up the > > > > question internally to find out how Windows handles the DSM and to > > > > get a recommendation from our firmware leads. > > > > > > We want Linux to be able to program LTR itself, don't we? We > > > shouldn't have to rely on firmware to do it. If Linux can't do > > > it, hot-added devices aren't going to be able to use L1.2, > > > right? > > > > Agreed. We just want to make sure we are not conflicting with what > > BIOS may be doing. > > So the feedback is to run the _DSM and just overwrite any BIOS > values. Looking up the _DSM I saw there was an attempt to upstream > this 4 years ago [1]. I'm not sure why the effort stalled but we can > pick up this work again. > > https://patchwork.kernel.org/project/linux-pci/patch/20201015080311.7811-1-puranjay12@gmail.com/ There was a PCI SIG discussion about this a few years ago that never really seemed to get resolved: https://members.pcisig.com/wg/PCIe-Protocol/mail/thread/35064 Unfortunately that discussion is not public, but the summary is: Q: How is the LTR_L1.2_THRESHOLD value determined? PCIe r5.0, sec 5.5.4, says the same value must be programmed into both Ports. A: As noted in sec 5.5.4, the value is determined primarily by the amount of time it will take to re-establish the common mode bias on the AC coupling caps, and it is assumed that the BIOS knows this. Q: How are the LTR Max Snoop values determined? PCI Firmware r3.3, sec 4.6.6, says the LTR _DSM reports the max values for each Downstream Port embedded in the platform, and the OS should calculate latencies along the path between each Downstream Port and any Upstream Port (Switch Upstream Port or Endpoint). Of course, Switches not embedded in the platform (e.g., external Thunderbolt hierarchies) will not have this _DSM, but I assume they should contribute to this sum? A: The fundamental problem is that there is no practical way for software to discover the AC coupling capacitor sizes and common mode bias circuit impedance. Software could compute conservative values, but they would likely be 10x worse than typical, so the L1.2 exit latency would be significantly longer than actually required to be. The interoperability issues here were understood when designing L1 Substates, but no viable solution was found. So the main reason Puranjay's work got stalled is that I didn't feel confident enough that we understood how to do this, especially for external devices. It would be great if somebody *did* feel confident about interpreting and implementing all this. Bjorn
On Tue, 2024-02-06 at 17:30 -0600, Bjorn Helgaas wrote: > On Tue, Feb 06, 2024 at 01:25:29PM -0800, David E. Box wrote: > > On Mon, 2024-02-05 at 15:05 -0800, David E. Box wrote: > > > On Mon, 2024-02-05 at 16:42 -0600, Bjorn Helgaas wrote: > > > > On Mon, Feb 05, 2024 at 11:37:16AM -0800, David E. Box wrote: > > > > > On Fri, 2024-02-02 at 18:05 -0600, Bjorn Helgaas wrote: > > > > > > On Fri, Feb 02, 2024 at 03:11:12PM +0800, Jian-Hong Pan wrote: > > > > > ... > > > > > > > > > > > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev > > > > > > > *pdev, > > > > > > > void *userdata) > > > > > > > pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, > > > > > > > ltr_reg); > > > > > > > pci_info(pdev, "VMD: Default LTR value set by driver\n"); > > > > > > > > > > > > You're not changing this part, and I don't understand exactly how > > > > > > LTR > > > > > > works, but it makes me a little bit queasy to read "set the LTR > > > > > > value > > > > > > to the maximum required to allow the deepest power management > > > > > > savings" and then we set the max snoop values to a fixed constant. > > > > > > > > > > > > I don't think the goal is to "allow the deepest power savings"; I > > > > > > think it's to enable L1.2 *when the device has enough buffering to > > > > > > absorb L1.2 entry/exit latencies*. > > > > > > > > > > > > The spec (PCIe r6.0, sec 7.8.2.2) says "Software should set this to > > > > > > the platform's maximum supported latency or less," so it seems like > > > > > > that value must be platform-dependent, not fixed. > > > > > > > > > > > > And I assume the "_DSM for Latency Tolerance Reporting" is part of > > > > > > the > > > > > > way to get those platform-dependent values, but Linux doesn't > > > > > > actually > > > > > > use that yet. > > > > > > > > > > This may indeed be the best way but we need to double check with our > > > > > BIOS folks. AFAIK BIOS writes the LTR values directly so there > > > > > hasn't been a need to use this _DSM. But under VMD the ports are > > > > > hidden from BIOS which is why we added it here. I've brought up the > > > > > question internally to find out how Windows handles the DSM and to > > > > > get a recommendation from our firmware leads. > > > > > > > > We want Linux to be able to program LTR itself, don't we? We > > > > shouldn't have to rely on firmware to do it. If Linux can't do > > > > it, hot-added devices aren't going to be able to use L1.2, > > > > right? > > > > > > Agreed. We just want to make sure we are not conflicting with what > > > BIOS may be doing. > > > > So the feedback is to run the _DSM and just overwrite any BIOS > > values. Looking up the _DSM I saw there was an attempt to upstream > > this 4 years ago [1]. I'm not sure why the effort stalled but we can > > pick up this work again. > > > > https://patchwork.kernel.org/project/linux-pci/patch/20201015080311.7811-1-puranjay12@gmail.com/ > > There was a PCI SIG discussion about this a few years ago that never > really seemed to get resolved: > https://members.pcisig.com/wg/PCIe-Protocol/mail/thread/35064 > > Unfortunately that discussion is not public, but the summary is: > > Q: How is the LTR_L1.2_THRESHOLD value determined? > > PCIe r5.0, sec 5.5.4, says the same value must be programmed into > both Ports. > > A: As noted in sec 5.5.4, the value is determined primarily by > the amount of time it will take to re-establish the common > mode bias on the AC coupling caps, and it is assumed that the > BIOS knows this. > > Q: How are the LTR Max Snoop values determined? > > PCI Firmware r3.3, sec 4.6.6, says the LTR _DSM reports the max > values for each Downstream Port embedded in the platform, and the > OS should calculate latencies along the path between each > Downstream Port and any Upstream Port (Switch Upstream Port or > Endpoint). > > Of course, Switches not embedded in the platform (e.g., external > Thunderbolt hierarchies) will not have this _DSM, but I assume > they should contribute to this sum? > > A: The fundamental problem is that there is no practical way for > software to discover the AC coupling capacitor sizes and > common mode bias circuit impedance. > > Software could compute conservative values, but they would > likely be 10x worse than typical, so the L1.2 exit latency > would be significantly longer than actually required to be. > > The interoperability issues here were understood when > designing L1 Substates, but no viable solution was found. > > So the main reason Puranjay's work got stalled is that I didn't feel > confident enough that we understood how to do this, especially for > external devices. > > It would be great if somebody *did* feel confident about interpreting > and implementing all this. As it is BIOS (at least Intel BIOS) is already writing the maximum allowed LTR value on Upstream Ports that have it set to 0. So we can't do any worse if we write the BIOS provided _DSM value for all Upstream Ports, including external devices. Sounds like the worst case scenario is that devices take longer than needed to exit L1.2 (I'm still asking about this detail). But I think this is better than not programming the LTR at all which could prevent the platform from power gating they very resources that LTR is meant to help manage. If that reasoning is okay with you, I'll submit patches to use the _DSM. David
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index 87b7856f375a..66e47a0dbf1a 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -751,11 +751,9 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) if (!(features & VMD_FEAT_BIOS_PM_QUIRK)) return 0; - pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); - pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR); if (!pos) - return 0; + goto out_enable_link_state; /* * Skip if the max snoop LTR is non-zero, indicating BIOS has set it @@ -763,7 +761,7 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) */ pci_read_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, <r_reg); if (!!(ltr_reg & (PCI_LTR_VALUE_MASK | PCI_LTR_SCALE_MASK))) - return 0; + goto out_enable_link_state; /* * Set the default values to the maximum required by the platform to @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev *pdev, void *userdata) pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT, ltr_reg); pci_info(pdev, "VMD: Default LTR value set by driver\n"); +out_enable_link_state: + /* + * Make PCI devices at D0 when enable PCI-PM L1 PM Substates from + * Section 5.5.4 of PCIe Base Spec Revision 6.0 + */ + pci_set_power_state_locked(pdev, PCI_D0); + pci_enable_link_state_locked(pdev, PCIE_LINK_STATE_ALL); + return 0; } @@ -926,7 +932,6 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) dev_get_msi_domain(&vmd->dev->dev)); vmd_acpi_begin(); - pci_scan_child_bus(vmd->bus); vmd_domain_reset(vmd); diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index bc0bd86695ec..5f902c5552ca 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -1164,6 +1164,8 @@ static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked) link->aspm_default |= ASPM_STATE_L1_1_PCIPM | ASPM_STATE_L1; if (state & PCIE_LINK_STATE_L1_2_PCIPM) link->aspm_default |= ASPM_STATE_L1_2_PCIPM | ASPM_STATE_L1; + if (state & ASPM_STATE_L1_2_MASK) + aspm_l1ss_init(link); pcie_config_aspm_link(link, policy_to_aspm_state(link)); link->clkpm_default = (state & PCIE_LINK_STATE_CLKPM) ? 1 : 0; @@ -1182,6 +1184,10 @@ static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked) * touch the LNKCTL register. Also note that this does not enable states * disabled by pci_disable_link_state(). Return 0 or a negative errno. * + * Note: The PCIe devices of the link must be in D0, if the PCI-PM L1 PM + * substates are going to be enabled. From Section 5.5.4 of PCIe Base Spec + * Revision 6.0. + * * @pdev: PCI device * @state: Mask of ASPM link states to enable */ @@ -1198,6 +1204,10 @@ EXPORT_SYMBOL(pci_enable_link_state); * can't touch the LNKCTL register. Also note that this does not enable states * disabled by pci_disable_link_state(). Return 0 or a negative errno. * + * Note: The PCIe devices of the link must be in D0, if the PCI-PM L1 PM + * substates are going to be enabled. From Section 5.5.4 of PCIe Base Spec + * Revision 6.0. + * * @pdev: PCI device * @state: Mask of ASPM link states to enable *
The remapped PCIe Root Port and NVMe have PCI PM L1 substates capability, but they are disabled originally: Here is an example on ASUS B1400CEAE: Capabilities: [900 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us Power on all of the VMD remapped PCI devices and quirk max snoop LTR before enable PCI-PM L1 PM Substates by following "Section 5.5.4 of PCIe Base Spec Revision 6.0". Then, PCI PM's L1 substates control are initialized & enabled accordingly. Also, update the comments of pci_enable_link_state() and pci_enable_link_state_locked() for kernel-doc, too. Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> --- v2: - Power on the VMD remapped devices with pci_set_power_state_locked() - Prepare the PCIe LTR parameters before enable L1 Substates - Add note into the comments of both pci_enable_link_state() and pci_enable_link_state_locked() for kernel-doc. - The original patch set can be split as individual patches. drivers/pci/controller/vmd.c | 15 ++++++++++----- drivers/pci/pcie/aspm.c | 10 ++++++++++ 2 files changed, 20 insertions(+), 5 deletions(-)