Message ID | 20240719080255.10998-2-jhp@endlessos.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | PCI: vmd: Enable PCI PM's L1 substates of remapped PCIe Root Port and NVMe | expand |
Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > Currently, when enable link's L1.2 features with __pci_enable_link_state(), > it configs the link directly without ensuring related L1.2 parameters, such > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been > programmed. > > This leads the link's L1.2 between PCIe Root Port and child device gets > wrong configs when a caller tries to enabled it. > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller (rev 01) (prog-if 00 [Normal decode]) > ... > Capabilities: [200 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=45us LTR1.2_Threshold=101376ns > L1SubCtl2: T_PwrOn=50us > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > ... > Capabilities: [900 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=0us LTR1.2_Threshold=0ns > L1SubCtl2: T_PwrOn=10us > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the PCIe > Root Port and the child NVMe, they should be programmed with the same > LTR1.2_Threshold value. However, they have different values in this case. > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly before > enable L1.2 bits of L1 PM Substates Control Register in > __pci_enable_link_state(). > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > --- > v2: > - Prepare the PCIe LTR parameters before enable L1 Substates > > v3: > - Only enable supported features for the L1 Substates part > > v4: > - Focus on fixing L1.2 parameters, instead of re-initializing whole L1SS > > v5: > - Fix typo and commit message > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > aspm_get_l1ss_cap()" > > v6: > - Skipped > > v7: > - Pick back and rebase on the new version kernel > - Drop the link state flag check. And, always config link state's timing > parameters > > v8: > - Because pcie_aspm_get_link() might return the link as NULL, move > getting the link's parent and child devices after check the link is > not NULL. This avoids NULL memory access. > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > index 5db1044c9895..55ff1d26fcea 100644 > --- a/drivers/pci/pcie/aspm.c > +++ b/drivers/pci/pcie/aspm.c > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked) > { > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > + u32 parent_l1ss_cap, child_l1ss_cap; > + struct pci_dev *parent, *child; > > if (!link) > return -EINVAL; > + > + parent = link->pdev; > + child = link->downstream; > + > /* > * A driver requested that ASPM be enabled on this device, but > * if we don't have permission to manage ASPM (e.g., on ACPI > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked) > if (!locked) > down_read(&pci_bus_sem); > mutex_lock(&aspm_lock); > + /* > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and > + * LTR_L1.2_THRESHOLD are programmed properly before enable bits for > + * L1.2, per PCIe r6.0, sec 5.5.4. > + */ > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > + child_l1ss_cap = aspm_get_l1ss_cap(child); > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > + > link->aspm_default = pci_calc_aspm_enable_mask(state); > pcie_config_aspm_link(link, policy_to_aspm_state(link)); > > -- > 2.45.2 > Hi Nirmal and Paul, It will be great to have your review here. I had tried to "set the threshold value in vmd_pm_enable_quirk()" directly as Paul said [1]. However, it still needs to get the PCIe link from the PCIe device to set the threshold value. And, pci_enable_link_state_locked() gets the link. Then, it will be great to calculate and programm L1 sub-states' parameters properly before configuring the link's ASPM there. [1]: https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124 Jian-Hong Pan
On Fri, 2 Aug 2024 16:24:18 +0800 Jian-Hong Pan <jhp@endlessos.org> wrote: > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > Currently, when enable link's L1.2 features with > > __pci_enable_link_state(), it configs the link directly without > > ensuring related L1.2 parameters, such as T_POWER_ON, > > Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been > > programmed. > > > > This leads the link's L1.2 between PCIe Root Port and child device > > gets wrong configs when a caller tries to enabled it. > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > PCIe Controller (rev 01) (prog-if 00 [Normal decode]) ... > > Capabilities: [200 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > L1_PM_Substates+ PortCommonModeRestoreTime=45us > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- > > ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us LTR1.2_Threshold=101376ns > > L1SubCtl2: T_PwrOn=50us > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > SN550 NVMe SSD (rev 01) (prog-if 02 [NVM Express]) ... > > Capabilities: [900 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > L1_PM_Substates+ PortCommonModeRestoreTime=32us > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- > > ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns > > L1SubCtl2: T_PwrOn=10us > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on > > the PCIe Root Port and the child NVMe, they should be programmed > > with the same LTR1.2_Threshold value. However, they have different > > values in this case. > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > before enable L1.2 bits of L1 PM Substates Control Register in > > __pci_enable_link_state(). > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > --- > > v2: > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > v3: > > - Only enable supported features for the L1 Substates part > > > > v4: > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > L1SS > > > > v5: > > - Fix typo and commit message > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > aspm_get_l1ss_cap()" > > > > v6: > > - Skipped > > > > v7: > > - Pick back and rebase on the new version kernel > > - Drop the link state flag check. And, always config link state's > > timing parameters > > > > v8: > > - Because pcie_aspm_get_link() might return the link as NULL, move > > getting the link's parent and child devices after check the link > > is not NULL. This avoids NULL memory access. > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > index 5db1044c9895..55ff1d26fcea 100644 > > --- a/drivers/pci/pcie/aspm.c > > +++ b/drivers/pci/pcie/aspm.c > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > static int __pci_enable_link_state(struct pci_dev *pdev, int > > state, bool locked) { > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > + u32 parent_l1ss_cap, child_l1ss_cap; > > + struct pci_dev *parent, *child; > > > > if (!link) > > return -EINVAL; > > + > > + parent = link->pdev; > > + child = link->downstream; > > + > > /* > > * A driver requested that ASPM be enabled on this device, > > but > > * if we don't have permission to manage ASPM (e.g., on ACPI > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct > > pci_dev *pdev, int state, bool locked) if (!locked) > > down_read(&pci_bus_sem); > > mutex_lock(&aspm_lock); > > + /* > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > T_POWER_ON and > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > bits for > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > + */ > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > + > > link->aspm_default = pci_calc_aspm_enable_mask(state); > > pcie_config_aspm_link(link, policy_to_aspm_state(link)); > > > > -- > > 2.45.2 > > > > Hi Nirmal and Paul, > > It will be great to have your review here. > > I had tried to "set the threshold value in vmd_pm_enable_quirk()" > directly as Paul said [1]. However, it still needs to get the PCIe > link from the PCIe device to set the threshold value. > And, pci_enable_link_state_locked() gets the link. Then, it will be > great to calculate and programm L1 sub-states' parameters properly > before configuring the link's ASPM there. > > [1]: > https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124 > > Jian-Hong Pan Hi Jian-Hong Pan, I am not an LTR, ASPM expert, but this part looks good to me. Can you explain why you decided to move pci_enable_link_state_locked() call down to out_state_change in vmd.c? Will it cause any issue if pci_find_ext_capability returns 0? Thanks -nirmal
Hi Jian-Hong, On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > Currently, when enable link's L1.2 features with __pci_enable_link_state(), > > it configs the link directly without ensuring related L1.2 parameters, such > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been > > programmed. > > > > This leads the link's L1.2 between PCIe Root Port and child device gets > > wrong configs when a caller tries to enabled it. > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe > > Controller (rev 01) (prog-if 00 [Normal decode]) > > ... > > Capabilities: [200 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > L1_PM_Substates+ > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > L1SubCtl2: T_PwrOn=50us > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > ... > > Capabilities: [900 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > L1_PM_Substates+ > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=0ns > > L1SubCtl2: T_PwrOn=10us > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the PCIe > > Root Port and the child NVMe, they should be programmed with the same > > LTR1.2_Threshold value. However, they have different values in this case. > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly before > > enable L1.2 bits of L1 PM Substates Control Register in > > __pci_enable_link_state(). > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > --- > > v2: > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > v3: > > - Only enable supported features for the L1 Substates part > > > > v4: > > - Focus on fixing L1.2 parameters, instead of re-initializing whole L1SS > > > > v5: > > - Fix typo and commit message > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > aspm_get_l1ss_cap()" > > > > v6: > > - Skipped > > > > v7: > > - Pick back and rebase on the new version kernel > > - Drop the link state flag check. And, always config link state's timing > > parameters > > > > v8: > > - Because pcie_aspm_get_link() might return the link as NULL, move > > getting the link's parent and child devices after check the link is > > not NULL. This avoids NULL memory access. > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > 1 file changed, 15 insertions(+) > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > index 5db1044c9895..55ff1d26fcea 100644 > > --- a/drivers/pci/pcie/aspm.c > > +++ b/drivers/pci/pcie/aspm.c > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool > > locked) > > { > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > + u32 parent_l1ss_cap, child_l1ss_cap; > > + struct pci_dev *parent, *child; > > > > if (!link) > > return -EINVAL; > > + > > + parent = link->pdev; > > + child = link->downstream; > > + > > /* > > * A driver requested that ASPM be enabled on this device, but > > * if we don't have permission to manage ASPM (e.g., on ACPI > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct pci_dev > > *pdev, int state, bool locked) > > if (!locked) > > down_read(&pci_bus_sem); > > mutex_lock(&aspm_lock); > > + /* > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and > > + * LTR_L1.2_THRESHOLD are programmed properly before enable bits for > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > + */ > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); I still don't think this is the place to recalculate the L1.2 parameters especially when know the calculation was done but was cleared by pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after pci_bus_reset() in vmd.c? David > > + > > link->aspm_default = pci_calc_aspm_enable_mask(state); > > pcie_config_aspm_link(link, policy_to_aspm_state(link)); > > > > -- > > 2.45.2 > > > > Hi Nirmal and Paul, > > It will be great to have your review here. > > I had tried to "set the threshold value in vmd_pm_enable_quirk()" > directly as Paul said [1]. However, it still needs to get the PCIe > link from the PCIe device to set the threshold value. > And, pci_enable_link_state_locked() gets the link. Then, it will be > great to calculate and programm L1 sub-states' parameters properly > before configuring the link's ASPM there. > > [1]: > https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124 > > Jian-Hong Pan
Nirmal Patel <nirmal.patel@linux.intel.com> 於 2024年8月6日 週二 上午2:25寫道: > > On Fri, 2 Aug 2024 16:24:18 +0800 > Jian-Hong Pan <jhp@endlessos.org> wrote: > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > Currently, when enable link's L1.2 features with > > > __pci_enable_link_state(), it configs the link directly without > > > ensuring related L1.2 parameters, such as T_POWER_ON, > > > Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been > > > programmed. > > > > > > This leads the link's L1.2 between PCIe Root Port and child device > > > gets wrong configs when a caller tries to enabled it. > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > > PCIe Controller (rev 01) (prog-if 00 [Normal decode]) ... > > > Capabilities: [200 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > L1_PM_Substates+ PortCommonModeRestoreTime=45us > > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- > > > ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us LTR1.2_Threshold=101376ns > > > L1SubCtl2: T_PwrOn=50us > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > > SN550 NVMe SSD (rev 01) (prog-if 02 [NVM Express]) ... > > > Capabilities: [900 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > L1_PM_Substates+ PortCommonModeRestoreTime=32us > > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- > > > ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns > > > L1SubCtl2: T_PwrOn=10us > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on > > > the PCIe Root Port and the child NVMe, they should be programmed > > > with the same LTR1.2_Threshold value. However, they have different > > > values in this case. > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > > before enable L1.2 bits of L1 PM Substates Control Register in > > > __pci_enable_link_state(). > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > --- > > > v2: > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > v3: > > > - Only enable supported features for the L1 Substates part > > > > > > v4: > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > > L1SS > > > > > > v5: > > > - Fix typo and commit message > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > aspm_get_l1ss_cap()" > > > > > > v6: > > > - Skipped > > > > > > v7: > > > - Pick back and rebase on the new version kernel > > > - Drop the link state flag check. And, always config link state's > > > timing parameters > > > > > > v8: > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > getting the link's parent and child devices after check the link > > > is not NULL. This avoids NULL memory access. > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > 1 file changed, 15 insertions(+) > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > index 5db1044c9895..55ff1d26fcea 100644 > > > --- a/drivers/pci/pcie/aspm.c > > > +++ b/drivers/pci/pcie/aspm.c > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > static int __pci_enable_link_state(struct pci_dev *pdev, int > > > state, bool locked) { > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > + struct pci_dev *parent, *child; > > > > > > if (!link) > > > return -EINVAL; > > > + > > > + parent = link->pdev; > > > + child = link->downstream; > > > + > > > /* > > > * A driver requested that ASPM be enabled on this device, > > > but > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct > > > pci_dev *pdev, int state, bool locked) if (!locked) > > > down_read(&pci_bus_sem); > > > mutex_lock(&aspm_lock); > > > + /* > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > > T_POWER_ON and > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > > bits for > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > + */ > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > > + > > > link->aspm_default = pci_calc_aspm_enable_mask(state); > > > pcie_config_aspm_link(link, policy_to_aspm_state(link)); > > > > > > -- > > > 2.45.2 > > > > > > > Hi Nirmal and Paul, > > > > It will be great to have your review here. > > > > I had tried to "set the threshold value in vmd_pm_enable_quirk()" > > directly as Paul said [1]. However, it still needs to get the PCIe > > link from the PCIe device to set the threshold value. > > And, pci_enable_link_state_locked() gets the link. Then, it will be > > great to calculate and programm L1 sub-states' parameters properly > > before configuring the link's ASPM there. > > > > [1]: > > https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124 > > > > Jian-Hong Pan > > Hi Jian-Hong Pan, > > I am not an LTR, ASPM expert, but this part looks good to me. > > Can you explain why you decided to move pci_enable_link_state_locked() > call down to out_state_change in vmd.c? The idea is setting all LTR related parameters before enabling the ASPM feature. > Will it cause any issue if pci_find_ext_capability returns 0? If pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_LTR) in vmd_pm_enable_quirk() returns 0, then the device is not a PCIe device. Then, it goes to: ... pci_enable_link_state_locked() __pci_enable_link_state() __pci_enable_link_state() uses pcie_aspm_get_link() to get the link between the PCIe bridge and the PCIe device. And, pcie_aspm_get_link() returns the link as a barrier. If pcie_aspm_get_link() does not get the link, then the device is a PCIe bridge, or not a PCIe device. Because the link is NULL, __pci_enable_link_state() returns with -EINVAL directly and will not configure/enable ASPM things. Jian-Hong Pan
David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道: > > Hi Jian-Hong, > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > Currently, when enable link's L1.2 features with __pci_enable_link_state(), > > > it configs the link directly without ensuring related L1.2 parameters, such > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been > > > programmed. > > > > > > This leads the link's L1.2 between PCIe Root Port and child device gets > > > wrong configs when a caller tries to enabled it. > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > ... > > > Capabilities: [200 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > L1_PM_Substates+ > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > L1SubCtl2: T_PwrOn=50us > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > ... > > > Capabilities: [900 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > L1_PM_Substates+ > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > L1SubCtl2: T_PwrOn=10us > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the PCIe > > > Root Port and the child NVMe, they should be programmed with the same > > > LTR1.2_Threshold value. However, they have different values in this case. > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly before > > > enable L1.2 bits of L1 PM Substates Control Register in > > > __pci_enable_link_state(). > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > --- > > > v2: > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > v3: > > > - Only enable supported features for the L1 Substates part > > > > > > v4: > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole L1SS > > > > > > v5: > > > - Fix typo and commit message > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > aspm_get_l1ss_cap()" > > > > > > v6: > > > - Skipped > > > > > > v7: > > > - Pick back and rebase on the new version kernel > > > - Drop the link state flag check. And, always config link state's timing > > > parameters > > > > > > v8: > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > getting the link's parent and child devices after check the link is > > > not NULL. This avoids NULL memory access. > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > 1 file changed, 15 insertions(+) > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > index 5db1044c9895..55ff1d26fcea 100644 > > > --- a/drivers/pci/pcie/aspm.c > > > +++ b/drivers/pci/pcie/aspm.c > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool > > > locked) > > > { > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > + struct pci_dev *parent, *child; > > > > > > if (!link) > > > return -EINVAL; > > > + > > > + parent = link->pdev; > > > + child = link->downstream; > > > + > > > /* > > > * A driver requested that ASPM be enabled on this device, but > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct pci_dev > > > *pdev, int state, bool locked) > > > if (!locked) > > > down_read(&pci_bus_sem); > > > mutex_lock(&aspm_lock); > > > + /* > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable bits for > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > + */ > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > I still don't think this is the place to recalculate the L1.2 parameters > especially when know the calculation was done but was cleared by > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after > pci_bus_reset() in vmd.c? I have not thought pci_save/restore_state() around pci_bus_reset() before. It is an interesting direction. So, I prepare modification below for test. Include "[PATCH v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates", too. Then, both the PCIe bridge and the PCIe device have the same LTR_L1.2_THRESHOLD 101376ns as expected. diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c index bbf4a47e7b31..6b8dd4f30127 100644 --- a/drivers/pci/controller/vmd.c +++ b/drivers/pci/controller/vmd.c @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct pci_host_bridge *root_bridge, vmd_bridge->native_dpc = root_bridge->native_dpc; } +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata) +{ + pci_save_state(pdev); + return 0; +} + +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata) +{ + pci_restore_state(pdev); + return 0; +} + /* * Enable ASPM and LTR settings on devices that aren't configured by BIOS. */ @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) pci_scan_child_bus(vmd->bus); vmd_domain_reset(vmd); + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); /* When Intel VMD is enabled, the OS does not discover the Root Ports * owned by Intel VMD within the MMCFG space. pci_reset_bus() applies * a reset to the parent of the PCI device supplied as argument. This @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) break; } } + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); pci_assign_unassigned_bus_resources(vmd->bus); Jian-Hong Pan > > > + > > > link->aspm_default = pci_calc_aspm_enable_mask(state); > > > pcie_config_aspm_link(link, policy_to_aspm_state(link)); > > > > > > -- > > > 2.45.2 > > > > > > > Hi Nirmal and Paul, > > > > It will be great to have your review here. > > > > I had tried to "set the threshold value in vmd_pm_enable_quirk()" > > directly as Paul said [1]. However, it still needs to get the PCIe > > link from the PCIe device to set the threshold value. > > And, pci_enable_link_state_locked() gets the link. Then, it will be > > great to calculate and programm L1 sub-states' parameters properly > > before configuring the link's ASPM there. > > > > [1]: > > https://lore.kernel.org/linux-kernel/20240624081108.10143-2-jhp@endlessos.org/T/#mc467498213fe1a6116985c04d714dae378976124 > > > > Jian-Hong Pan >
On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道: > > > > Hi Jian-Hong, > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > > > Currently, when enable link's L1.2 features with __pci_enable_link_state(), > > > > it configs the link directly without ensuring related L1.2 parameters, such > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been > > > > programmed. > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device gets > > > > wrong configs when a caller tries to enabled it. > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > ... > > > > Capabilities: [200 v1] L1 PM Substates > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > > L1_PM_Substates+ > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > > L1SubCtl2: T_PwrOn=50us > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > ... > > > > Capabilities: [900 v1] L1 PM Substates > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > L1_PM_Substates+ > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > > L1SubCtl2: T_PwrOn=10us > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the PCIe > > > > Root Port and the child NVMe, they should be programmed with the same > > > > LTR1.2_Threshold value. However, they have different values in this case. > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly before > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > __pci_enable_link_state(). > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > --- > > > > v2: > > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > > > v3: > > > > - Only enable supported features for the L1 Substates part > > > > > > > > v4: > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole L1SS > > > > > > > > v5: > > > > - Fix typo and commit message > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > > aspm_get_l1ss_cap()" > > > > > > > > v6: > > > > - Skipped > > > > > > > > v7: > > > > - Pick back and rebase on the new version kernel > > > > - Drop the link state flag check. And, always config link state's timing > > > > parameters > > > > > > > > v8: > > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > > getting the link's parent and child devices after check the link is > > > > not NULL. This avoids NULL memory access. > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > 1 file changed, 15 insertions(+) > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > > index 5db1044c9895..55ff1d26fcea 100644 > > > > --- a/drivers/pci/pcie/aspm.c > > > > +++ b/drivers/pci/pcie/aspm.c > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool > > > > locked) > > > > { > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > + struct pci_dev *parent, *child; > > > > > > > > if (!link) > > > > return -EINVAL; > > > > + > > > > + parent = link->pdev; > > > > + child = link->downstream; > > > > + > > > > /* > > > > * A driver requested that ASPM be enabled on this device, but > > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct pci_dev > > > > *pdev, int state, bool locked) > > > > if (!locked) > > > > down_read(&pci_bus_sem); > > > > mutex_lock(&aspm_lock); > > > > + /* > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable bits for > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > + */ > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > > > I still don't think this is the place to recalculate the L1.2 parameters > > especially when know the calculation was done but was cleared by > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after > > pci_bus_reset() in vmd.c? > > I have not thought pci_save/restore_state() around pci_bus_reset() > before. It is an interesting direction. > > So, I prepare modification below for test. Include "[PATCH v8 1/4] > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates", > too. Then, both the PCIe bridge and the PCIe device have the same > LTR_L1.2_THRESHOLD 101376ns as expected. > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > index bbf4a47e7b31..6b8dd4f30127 100644 > --- a/drivers/pci/controller/vmd.c > +++ b/drivers/pci/controller/vmd.c > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct > pci_host_bridge *root_bridge, > vmd_bridge->native_dpc = root_bridge->native_dpc; > } > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata) > +{ > + pci_save_state(pdev); > + return 0; > +} > + > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata) > +{ > + pci_restore_state(pdev); > + return 0; > +} > + > /* > * Enable ASPM and LTR settings on devices that aren't configured by BIOS. > */ > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > unsigned long features) > pci_scan_child_bus(vmd->bus); > vmd_domain_reset(vmd); > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > /* When Intel VMD is enabled, the OS does not discover the Root Ports > * owned by Intel VMD within the MMCFG space. pci_reset_bus() applies > * a reset to the parent of the PCI device supplied as argument. This > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > unsigned long features) > break; > } > } > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); Why not call pci_reset_bus() (or __pci_reset_bus()) then in vmd_enable_domain() which preserves state unlike pci_reset_bus()? (Don't tell me naming of these functions is a horrible mess. :-/)
On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道: > > > > > > Hi Jian-Hong, > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > __pci_enable_link_state(), > > > > > it configs the link directly without ensuring related L1.2 parameters, > > > > > such > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have > > > > > been > > > > > programmed. > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device > > > > > gets > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > > > > PCIe > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > ... > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > > > L1_PM_Substates+ > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > > > L1SubCtl2: T_PwrOn=50us > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > > > > SN550 > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > ... > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > L1_PM_Substates+ > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > > > L1SubCtl2: T_PwrOn=10us > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the > > > > > PCIe > > > > > Root Port and the child NVMe, they should be programmed with the same > > > > > LTR1.2_Threshold value. However, they have different values in this > > > > > case. > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > > > > before > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > __pci_enable_link_state(). > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > --- > > > > > v2: > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > > > > > v3: > > > > > - Only enable supported features for the L1 Substates part > > > > > > > > > > v4: > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > > > > L1SS > > > > > > > > > > v5: > > > > > - Fix typo and commit message > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > > > aspm_get_l1ss_cap()" > > > > > > > > > > v6: > > > > > - Skipped > > > > > > > > > > v7: > > > > > - Pick back and rebase on the new version kernel > > > > > - Drop the link state flag check. And, always config link state's > > > > > timing > > > > > parameters > > > > > > > > > > v8: > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > > > getting the link's parent and child devices after check the link is > > > > > not NULL. This avoids NULL memory access. > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > > > index 5db1044c9895..55ff1d26fcea 100644 > > > > > --- a/drivers/pci/pcie/aspm.c > > > > > +++ b/drivers/pci/pcie/aspm.c > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, > > > > > bool > > > > > locked) > > > > > { > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > if (!link) > > > > > return -EINVAL; > > > > > + > > > > > + parent = link->pdev; > > > > > + child = link->downstream; > > > > > + > > > > > /* > > > > > * A driver requested that ASPM be enabled on this device, but > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct > > > > > pci_dev > > > > > *pdev, int state, bool locked) > > > > > if (!locked) > > > > > down_read(&pci_bus_sem); > > > > > mutex_lock(&aspm_lock); > > > > > + /* > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > > > > T_POWER_ON and > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > > > > bits for > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > + */ > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > > > > > I still don't think this is the place to recalculate the L1.2 parameters > > > especially when know the calculation was done but was cleared by > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after > > > pci_bus_reset() in vmd.c? > > > > I have not thought pci_save/restore_state() around pci_bus_reset() > > before. It is an interesting direction. > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4] > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates", > > too. Then, both the PCIe bridge and the PCIe device have the same > > LTR_L1.2_THRESHOLD 101376ns as expected. > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > index bbf4a47e7b31..6b8dd4f30127 100644 > > --- a/drivers/pci/controller/vmd.c > > +++ b/drivers/pci/controller/vmd.c > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct > > pci_host_bridge *root_bridge, > > vmd_bridge->native_dpc = root_bridge->native_dpc; > > } > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata) > > +{ > > + pci_save_state(pdev); > > + return 0; > > +} > > + > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata) > > +{ > > + pci_restore_state(pdev); > > + return 0; > > +} > > + > > /* > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS. > > */ > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > unsigned long features) > > pci_scan_child_bus(vmd->bus); > > vmd_domain_reset(vmd); > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > /* When Intel VMD is enabled, the OS does not discover the Root > > Ports > > * owned by Intel VMD within the MMCFG space. pci_reset_bus() > > applies > > * a reset to the parent of the PCI device supplied as argument. > > This > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > unsigned long features) > > break; > > } > > } > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > vmd_enable_domain() which preserves state unlike pci_reset_bus()? > > (Don't tell me naming of these functions is a horrible mess. :-/) > Hmm. So this *is* calling pci_reset_bus(). L1.2 configuration has specific ordering requirements for changes to parent & child devices. Could be why it's not getting restored properly. David
On Wed, 7 Aug 2024, David E. Box wrote: > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道: > > > > > > > > Hi Jian-Hong, > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > __pci_enable_link_state(), > > > > > > it configs the link directly without ensuring related L1.2 parameters, > > > > > > such > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have > > > > > > been > > > > > > programmed. > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device > > > > > > gets > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > > > > > PCIe > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > ... > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > > > > L1_PM_Substates+ > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > > > > L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > > > > > SN550 > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > ... > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > L1_PM_Substates+ > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > > > > L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the > > > > > > PCIe > > > > > > Root Port and the child NVMe, they should be programmed with the same > > > > > > LTR1.2_Threshold value. However, they have different values in this > > > > > > case. > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > > > > > before > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > > --- > > > > > > v2: > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > > > > > > > v3: > > > > > > - Only enable supported features for the L1 Substates part > > > > > > > > > > > > v4: > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > > > > > L1SS > > > > > > > > > > > > v5: > > > > > > - Fix typo and commit message > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > > > > aspm_get_l1ss_cap()" > > > > > > > > > > > > v6: > > > > > > - Skipped > > > > > > > > > > > > v7: > > > > > > - Pick back and rebase on the new version kernel > > > > > > - Drop the link state flag check. And, always config link state's > > > > > > timing > > > > > > parameters > > > > > > > > > > > > v8: > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > > > > getting the link's parent and child devices after check the link is > > > > > > not NULL. This avoids NULL memory access. > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > > > > index 5db1044c9895..55ff1d26fcea 100644 > > > > > > --- a/drivers/pci/pcie/aspm.c > > > > > > +++ b/drivers/pci/pcie/aspm.c > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, > > > > > > bool > > > > > > locked) > > > > > > { > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > if (!link) > > > > > > return -EINVAL; > > > > > > + > > > > > > + parent = link->pdev; > > > > > > + child = link->downstream; > > > > > > + > > > > > > /* > > > > > > * A driver requested that ASPM be enabled on this device, but > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct > > > > > > pci_dev > > > > > > *pdev, int state, bool locked) > > > > > > if (!locked) > > > > > > down_read(&pci_bus_sem); > > > > > > mutex_lock(&aspm_lock); > > > > > > + /* > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > > > > > T_POWER_ON and > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > > > > > bits for > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > + */ > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > > > > > > > I still don't think this is the place to recalculate the L1.2 parameters > > > > especially when know the calculation was done but was cleared by > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after > > > > pci_bus_reset() in vmd.c? > > > > > > I have not thought pci_save/restore_state() around pci_bus_reset() > > > before. It is an interesting direction. > > > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4] > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates", > > > too. Then, both the PCIe bridge and the PCIe device have the same > > > LTR_L1.2_THRESHOLD 101376ns as expected. > > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > > index bbf4a47e7b31..6b8dd4f30127 100644 > > > --- a/drivers/pci/controller/vmd.c > > > +++ b/drivers/pci/controller/vmd.c > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct > > > pci_host_bridge *root_bridge, > > > vmd_bridge->native_dpc = root_bridge->native_dpc; > > > } > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata) > > > +{ > > > + pci_save_state(pdev); > > > + return 0; > > > +} > > > + > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata) > > > +{ > > > + pci_restore_state(pdev); > > > + return 0; > > > +} > > > + > > > /* > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS. > > > */ > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > unsigned long features) > > > pci_scan_child_bus(vmd->bus); > > > vmd_domain_reset(vmd); > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > /* When Intel VMD is enabled, the OS does not discover the Root > > > Ports > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus() > > > applies > > > * a reset to the parent of the PCI device supplied as argument. > > > This > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > unsigned long features) > > > break; > > > } > > > } > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > vmd_enable_domain() which preserves state unlike pci_reset_bus()? > > > > (Don't tell me naming of these functions is a horrible mess. :-/) > > Hmm. So this *is* calling pci_reset_bus(). Yeah, I managed to get confused by the names myself, I somehow ended up thinking it calls pci_bus_reset() which is not correct... > L1.2 configuration has specific > ordering requirements for changes to parent & child devices. Could be why it's > not getting restored properly. Indeed, it has to be something else since the patch above doesn't even restore anything because dev->state_saved should get set to false by the first pci_restore_state() called from __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I think!?
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 下午5:49寫道: > > On Wed, 7 Aug 2024, David E. Box wrote: > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道: > > > > > > > > > > Hi Jian-Hong, > > > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > __pci_enable_link_state(), > > > > > > > it configs the link directly without ensuring related L1.2 parameters, > > > > > > > such > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have > > > > > > > been > > > > > > > programmed. > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device > > > > > > > gets > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > > > > > > PCIe > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > ... > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > > > > > L1_PM_Substates+ > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > > > > > L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > > > > > > SN550 > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > ... > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > L1_PM_Substates+ > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > > > > > L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the > > > > > > > PCIe > > > > > > > Root Port and the child NVMe, they should be programmed with the same > > > > > > > LTR1.2_Threshold value. However, they have different values in this > > > > > > > case. > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > > > > > > before > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > > > --- > > > > > > > v2: > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > > > > > > > > > v3: > > > > > > > - Only enable supported features for the L1 Substates part > > > > > > > > > > > > > > v4: > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > > > > > > L1SS > > > > > > > > > > > > > > v5: > > > > > > > - Fix typo and commit message > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > > > > > aspm_get_l1ss_cap()" > > > > > > > > > > > > > > v6: > > > > > > > - Skipped > > > > > > > > > > > > > > v7: > > > > > > > - Pick back and rebase on the new version kernel > > > > > > > - Drop the link state flag check. And, always config link state's > > > > > > > timing > > > > > > > parameters > > > > > > > > > > > > > > v8: > > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > > > > > getting the link's parent and child devices after check the link is > > > > > > > not NULL. This avoids NULL memory access. > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > > > > > index 5db1044c9895..55ff1d26fcea 100644 > > > > > > > --- a/drivers/pci/pcie/aspm.c > > > > > > > +++ b/drivers/pci/pcie/aspm.c > > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, > > > > > > > bool > > > > > > > locked) > > > > > > > { > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > > > if (!link) > > > > > > > return -EINVAL; > > > > > > > + > > > > > > > + parent = link->pdev; > > > > > > > + child = link->downstream; > > > > > > > + > > > > > > > /* > > > > > > > * A driver requested that ASPM be enabled on this device, but > > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct > > > > > > > pci_dev > > > > > > > *pdev, int state, bool locked) > > > > > > > if (!locked) > > > > > > > down_read(&pci_bus_sem); > > > > > > > mutex_lock(&aspm_lock); > > > > > > > + /* > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > > > > > > T_POWER_ON and > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > > > > > > bits for > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > + */ > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > > > > > > > > > I still don't think this is the place to recalculate the L1.2 parameters > > > > > especially when know the calculation was done but was cleared by > > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after > > > > > pci_bus_reset() in vmd.c? > > > > > > > > I have not thought pci_save/restore_state() around pci_bus_reset() > > > > before. It is an interesting direction. > > > > > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4] > > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates", > > > > too. Then, both the PCIe bridge and the PCIe device have the same > > > > LTR_L1.2_THRESHOLD 101376ns as expected. > > > > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > > > index bbf4a47e7b31..6b8dd4f30127 100644 > > > > --- a/drivers/pci/controller/vmd.c > > > > +++ b/drivers/pci/controller/vmd.c > > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct > > > > pci_host_bridge *root_bridge, > > > > vmd_bridge->native_dpc = root_bridge->native_dpc; > > > > } > > > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata) > > > > +{ > > > > + pci_save_state(pdev); > > > > + return 0; > > > > +} > > > > + > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata) > > > > +{ > > > > + pci_restore_state(pdev); > > > > + return 0; > > > > +} > > > > + > > > > /* > > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS. > > > > */ > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > unsigned long features) > > > > pci_scan_child_bus(vmd->bus); > > > > vmd_domain_reset(vmd); > > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > > /* When Intel VMD is enabled, the OS does not discover the Root > > > > Ports > > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus() > > > > applies > > > > * a reset to the parent of the PCI device supplied as argument. > > > > This > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > unsigned long features) > > > > break; > > > > } > > > > } > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > > vmd_enable_domain() which preserves state unlike pci_reset_bus()? > > > > > > (Don't tell me naming of these functions is a horrible mess. :-/) > > > > Hmm. So this *is* calling pci_reset_bus(). > > Yeah, I managed to get confused by the names myself, I somehow > ended up thinking it calls pci_bus_reset() which is not correct... > > > L1.2 configuration has specific > > ordering requirements for changes to parent & child devices. Could be why it's > > not getting restored properly. > > Indeed, it has to be something else since the patch above doesn't even > restore anything because dev->state_saved should get set to false by the > first pci_restore_state() called from > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I > think!? Inspired by Ilpo's comment. I add some debug messages based on linux-next's tag 'next-20240809' to understand the code path of pci_reset_bus(): diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index ffaaca0978cb..3ee71374f1de 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) * races with ->remove() by the device lock, which must be held by * the caller. */ - if (err_handler && err_handler->reset_prepare) + if (err_handler && err_handler->reset_prepare) { + pci_info(dev, "%s: %pF\n", __func__, err_handler->reset_prepare); err_handler->reset_prepare(dev); + } /* * Wake-up device prior to save. PM registers default to D0 after @@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) pci_set_power_state(dev, PCI_D0); pci_save_state(dev); + pci_info(dev, "%s: PCI state_saved is %s\n", __func__, dev->state_saved ? "true" : "false"); /* * Disable the device by clearing the Command register, except for * INTx-disable which is set. This not only disables MMIO and I/O port @@ -5655,6 +5658,10 @@ static void pci_bus_save_and_disable_locked(struct pci_bus *bus) struct pci_dev *dev; list_for_each_entry(dev, &bus->devices, bus_list) { + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", + __func__, + dev->state_saved ? "true" : "false", + dev->subordinate ? "has" : "does not have"); pci_dev_save_and_disable(dev); if (dev->subordinate) pci_bus_save_and_disable_locked(dev->subordinate); @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus) struct pci_dev *dev; list_for_each_entry(dev, &bus->devices, bus_list) { + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", + __func__, + dev->state_saved ? "true" : "false", + dev->subordinate ? "has" : "does not have"); pci_dev_restore(dev); if (dev->subordinate) pci_bus_restore_locked(dev->subordinate); @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe) if (!bus->self || !pci_bus_resettable(bus)) return -ENOTTY; - if (probe) + if (probe) { + pci_info(bus->self, "%s: probe is true. So return 0 directly", __func__); return 0; + } pci_bus_lock(bus); @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus) int rc; rc = pci_bus_reset(bus, PCI_RESET_PROBE); + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc); if (rc) return rc; if (pci_bus_trylock(bus)) { + pci_info(bus->self, "%s: pci_bus_trylock() returns true\n", __func__); pci_bus_save_and_disable_locked(bus); might_sleep(); rc = pci_bridge_secondary_bus_reset(bus->self); @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus) */ int pci_reset_bus(struct pci_dev *pdev) { + pci_info(pdev, "%s: %s", __func__, !pci_probe_reset_slot(pdev->slot) ? "true" : "false"); return (!pci_probe_reset_slot(pdev->slot)) ? __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); } And, have the information of VMD PCIe devices with the built kernel: 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal decode]) ... Capabilities: [200 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=0us 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) ... Capabilities: [900 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the PCIe bridge has LTR1.2_Threshold=0ns. Then, check the dmesg. I notice the debug messages: pci 10000:e0:06.0: PCI bridge to [bus e1] pci 10000:e0:06.0: Primary bus is hard wired to 0 pci 10000:e1:00.0: pci_reset_bus: false pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0 pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is false, and does not have subordinate pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true Freeing initrd memory: 75236K pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true, and does not have subordinate So, the code path is: vmd_enable_domain() pci_reset_bus() __pci_reset_bus() pci_bus_reset() pci_bus_save_and_disable_locked() pci_dev_save_and_disable() pci_bus_restore_locked() pci_dev_restore() And, from the debug messages, I learned only NVMe 10000:e1:00.0 does pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not. So, PCIe bridge 10000:e0:06.0 does not restore state correctly. Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's bus in vmd_enable_domain(). * Bus "e1" has only NVMe 10000:e1:00.0 * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe 10000:e1:00.0 and SATA Controller 10000:e0:17.0. Here is the PCI tree: -+-[0000:00]-+-00.0 Intel Corporation Device 9a04 | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD Graphics G4] | +-04.0 Intel Corporation TigerLake-LP Dynamic Tuning Processor Participant | +-06.0 Intel Corporation RST VMD Managed Controller | +-07.0-[01-2b]-- | +-08.0 Intel Corporation GNA Scoring Accelerator module | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator Driver | +-0d.0 Intel Corporation Tiger Lake-LP Thunderbolt 4 USB Controller | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0 | +-0e.0 Intel Corporation Volume Management Device NVMe RAID Controller | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1 xHCI Host Controller | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM | +-14.3 Intel Corporation Wi-Fi 6 AX201 | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #0 | +-15.1 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #1 | +-16.0 Intel Corporation Tiger Lake-LP Management Engine Interface | +-17.0 Intel Corporation RST VMD Managed Controller | +-1f.0 Intel Corporation Tiger Lake-LP LPC Controller | +-1f.3 Intel Corporation Tiger Lake-LP Smart Sound Technology Audio Controller | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD \-17.0 Intel Corporation Tiger Lake-LP SATA Controller According the findings above, to ensure the devices on the VMD bus have correctly states, seems pci_save_state() all the devices before pci_reset_bus(), and pci_restore_state() all the devices after pci_reset_bus() is the correct answer. Jian-Hong Pan
On Mon, 12 Aug 2024, Jian-Hong Pan wrote: > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 下午5:49寫道: > > On Wed, 7 Aug 2024, David E. Box wrote: > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道: > > > > > > > > > > > > Hi Jian-Hong, > > > > > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > > __pci_enable_link_state(), > > > > > > > > it configs the link directly without ensuring related L1.2 parameters, > > > > > > > > such > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have > > > > > > > > been > > > > > > > > programmed. > > > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device > > > > > > > > gets > > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > > > > > > > PCIe > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > > ... > > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > > > > > > L1_PM_Substates+ > > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > > > > > > L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > > > > > > > SN550 > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > > ... > > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > L1_PM_Substates+ > > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > > > > > > L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the > > > > > > > > PCIe > > > > > > > > Root Port and the child NVMe, they should be programmed with the same > > > > > > > > LTR1.2_Threshold value. However, they have different values in this > > > > > > > > case. > > > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > > > > > > > before > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > > > > --- > > > > > > > > v2: > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > > > > > > > > > > > v3: > > > > > > > > - Only enable supported features for the L1 Substates part > > > > > > > > > > > > > > > > v4: > > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > > > > > > > L1SS > > > > > > > > > > > > > > > > v5: > > > > > > > > - Fix typo and commit message > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > > > > > > aspm_get_l1ss_cap()" > > > > > > > > > > > > > > > > v6: > > > > > > > > - Skipped > > > > > > > > > > > > > > > > v7: > > > > > > > > - Pick back and rebase on the new version kernel > > > > > > > > - Drop the link state flag check. And, always config link state's > > > > > > > > timing > > > > > > > > parameters > > > > > > > > > > > > > > > > v8: > > > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > > > > > > getting the link's parent and child devices after check the link is > > > > > > > > not NULL. This avoids NULL memory access. > > > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > > > > > > index 5db1044c9895..55ff1d26fcea 100644 > > > > > > > > --- a/drivers/pci/pcie/aspm.c > > > > > > > > +++ b/drivers/pci/pcie/aspm.c > > > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, > > > > > > > > bool > > > > > > > > locked) > > > > > > > > { > > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > > > > > if (!link) > > > > > > > > return -EINVAL; > > > > > > > > + > > > > > > > > + parent = link->pdev; > > > > > > > > + child = link->downstream; > > > > > > > > + > > > > > > > > /* > > > > > > > > * A driver requested that ASPM be enabled on this device, but > > > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct > > > > > > > > pci_dev > > > > > > > > *pdev, int state, bool locked) > > > > > > > > if (!locked) > > > > > > > > down_read(&pci_bus_sem); > > > > > > > > mutex_lock(&aspm_lock); > > > > > > > > + /* > > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > > > > > > > T_POWER_ON and > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > > > > > > > bits for > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > > + */ > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > > > > > > > > > > > I still don't think this is the place to recalculate the L1.2 parameters > > > > > > especially when know the calculation was done but was cleared by > > > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after > > > > > > pci_bus_reset() in vmd.c? > > > > > > > > > > I have not thought pci_save/restore_state() around pci_bus_reset() > > > > > before. It is an interesting direction. > > > > > > > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4] > > > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates", > > > > > too. Then, both the PCIe bridge and the PCIe device have the same > > > > > LTR_L1.2_THRESHOLD 101376ns as expected. > > > > > > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > > > > index bbf4a47e7b31..6b8dd4f30127 100644 > > > > > --- a/drivers/pci/controller/vmd.c > > > > > +++ b/drivers/pci/controller/vmd.c > > > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct > > > > > pci_host_bridge *root_bridge, > > > > > vmd_bridge->native_dpc = root_bridge->native_dpc; > > > > > } > > > > > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata) > > > > > +{ > > > > > + pci_save_state(pdev); > > > > > + return 0; > > > > > +} > > > > > + > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata) > > > > > +{ > > > > > + pci_restore_state(pdev); > > > > > + return 0; > > > > > +} > > > > > + > > > > > /* > > > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS. > > > > > */ > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > > unsigned long features) > > > > > pci_scan_child_bus(vmd->bus); > > > > > vmd_domain_reset(vmd); > > > > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > > > /* When Intel VMD is enabled, the OS does not discover the Root > > > > > Ports > > > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus() > > > > > applies > > > > > * a reset to the parent of the PCI device supplied as argument. > > > > > This > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > > unsigned long features) > > > > > break; > > > > > } > > > > > } > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > > > vmd_enable_domain() which preserves state unlike pci_reset_bus()? > > > > > > > > (Don't tell me naming of these functions is a horrible mess. :-/) > > > > > > Hmm. So this *is* calling pci_reset_bus(). > > > > Yeah, I managed to get confused by the names myself, I somehow > > ended up thinking it calls pci_bus_reset() which is not correct... > > > > > L1.2 configuration has specific > > > ordering requirements for changes to parent & child devices. Could be why it's > > > not getting restored properly. > > > > Indeed, it has to be something else since the patch above doesn't even > > restore anything because dev->state_saved should get set to false by the > > first pci_restore_state() called from > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I > > think!? > > Inspired by Ilpo's comment. I add some debug messages based on > linux-next's tag 'next-20240809' to understand the code path of > pci_reset_bus(): > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index ffaaca0978cb..3ee71374f1de 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > * races with ->remove() by the device lock, which must be held by > * the caller. > */ > - if (err_handler && err_handler->reset_prepare) > + if (err_handler && err_handler->reset_prepare) { > + pci_info(dev, "%s: %pF\n", __func__, > err_handler->reset_prepare); > err_handler->reset_prepare(dev); > + } > > /* > * Wake-up device prior to save. PM registers default to D0 after > @@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > pci_set_power_state(dev, PCI_D0); > > pci_save_state(dev); > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__, > dev->state_saved ? "true" : "false"); > /* > * Disable the device by clearing the Command register, except for > * INTx-disable which is set. This not only disables MMIO and I/O port > @@ -5655,6 +5658,10 @@ static void > pci_bus_save_and_disable_locked(struct pci_bus *bus) > struct pci_dev *dev; > > list_for_each_entry(dev, &bus->devices, bus_list) { > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", > + __func__, > + dev->state_saved ? "true" : "false", > + dev->subordinate ? "has" : "does not have"); > pci_dev_save_and_disable(dev); > if (dev->subordinate) > pci_bus_save_and_disable_locked(dev->subordinate); > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus) > struct pci_dev *dev; > > list_for_each_entry(dev, &bus->devices, bus_list) { > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", > + __func__, > + dev->state_saved ? "true" : "false", > + dev->subordinate ? "has" : "does not have"); > pci_dev_restore(dev); > if (dev->subordinate) > pci_bus_restore_locked(dev->subordinate); > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe) > if (!bus->self || !pci_bus_resettable(bus)) > return -ENOTTY; > > - if (probe) > + if (probe) { > + pci_info(bus->self, "%s: probe is true. So return 0 > directly", __func__); > return 0; > + } > > pci_bus_lock(bus); > > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus) > int rc; > > rc = pci_bus_reset(bus, PCI_RESET_PROBE); > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc); > if (rc) > return rc; > > if (pci_bus_trylock(bus)) { > + pci_info(bus->self, "%s: pci_bus_trylock() returns > true\n", __func__); > pci_bus_save_and_disable_locked(bus); > might_sleep(); > rc = pci_bridge_secondary_bus_reset(bus->self); > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus) > */ > int pci_reset_bus(struct pci_dev *pdev) > { > + pci_info(pdev, "%s: %s", __func__, > !pci_probe_reset_slot(pdev->slot) ? "true" : "false"); > return (!pci_probe_reset_slot(pdev->slot)) ? > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); > } > > And, have the information of VMD PCIe devices with the built kernel: > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > decode]) > ... > Capabilities: [200 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=0us LTR1.2_Threshold=0ns > L1SubCtl2: T_PwrOn=0us > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > ... > Capabilities: [900 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=0us LTR1.2_Threshold=101376ns > L1SubCtl2: T_PwrOn=50us > > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the > PCIe bridge has LTR1.2_Threshold=0ns. This is now the other way around as in the original posting that had 0ns for 10000:e1:00.0 ?? Is this behavior even consistent or did you e.g. mess up some copy pasting somewhere? > Then, check the dmesg. I notice the debug messages: > > pci 10000:e0:06.0: PCI bridge to [bus e1] > pci 10000:e0:06.0: Primary bus is hard wired to 0 > pci 10000:e1:00.0: pci_reset_bus: false > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0 > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is > false, and does not have subordinate > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true > Freeing initrd memory: 75236K > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true, > and does not have subordinate > > So, the code path is: > > vmd_enable_domain() > pci_reset_bus() > __pci_reset_bus() > pci_bus_reset() > pci_bus_save_and_disable_locked() > pci_dev_save_and_disable() > pci_bus_restore_locked() > pci_dev_restore() > > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not. > So, PCIe bridge 10000:e0:06.0 does not restore state correctly. It should not be necessary to restore the bridge's configuration as it ought to not be changed by the SBR itself, PCIe6 spec 7.5.1.3.13: "Port configuration registers must not be changed, except as required to update Port status." While the second part wording leaves some leeway, I don't think any of these field really fall under "Port status". > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's > bus in vmd_enable_domain(). > * Bus "e1" has only NVMe 10000:e1:00.0 > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe > 10000:e1:00.0 and SATA Controller 10000:e0:17.0. ...But even if those registers on the PCIe bridge were changed underneath against the spec, it's not clear from your debug log why pci_dev_restore() -> pci_restore_state() -> pci_restore_pcie_state() -> pci_restore_aspm_l1ss_state() did not restore also parent's LTR1.2_Threshold?? I think it should attempt to do that.
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年9月2日 週一 下午11:44寫道: > > On Mon, 12 Aug 2024, Jian-Hong Pan wrote: > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 下午5:49寫道: > > > On Wed, 7 Aug 2024, David E. Box wrote: > > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > > > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道: > > > > > > > > > > > > > > Hi Jian-Hong, > > > > > > > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > > > __pci_enable_link_state(), > > > > > > > > > it configs the link directly without ensuring related L1.2 parameters, > > > > > > > > > such > > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have > > > > > > > > > been > > > > > > > > > programmed. > > > > > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device > > > > > > > > > gets > > > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > > > > > > > > PCIe > > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > > > ... > > > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > > > > > > > L1_PM_Substates+ > > > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > > > > > > > L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > > > > > > > > SN550 > > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > > > ... > > > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > L1_PM_Substates+ > > > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > > > > > > > L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the > > > > > > > > > PCIe > > > > > > > > > Root Port and the child NVMe, they should be programmed with the same > > > > > > > > > LTR1.2_Threshold value. However, they have different values in this > > > > > > > > > case. > > > > > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > > > > > > > > before > > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > > > > > --- > > > > > > > > > v2: > > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > > > > > > > > > > > > > v3: > > > > > > > > > - Only enable supported features for the L1 Substates part > > > > > > > > > > > > > > > > > > v4: > > > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > > > > > > > > L1SS > > > > > > > > > > > > > > > > > > v5: > > > > > > > > > - Fix typo and commit message > > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > > > > > > > aspm_get_l1ss_cap()" > > > > > > > > > > > > > > > > > > v6: > > > > > > > > > - Skipped > > > > > > > > > > > > > > > > > > v7: > > > > > > > > > - Pick back and rebase on the new version kernel > > > > > > > > > - Drop the link state flag check. And, always config link state's > > > > > > > > > timing > > > > > > > > > parameters > > > > > > > > > > > > > > > > > > v8: > > > > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > > > > > > > getting the link's parent and child devices after check the link is > > > > > > > > > not NULL. This avoids NULL memory access. > > > > > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > > > > > > > index 5db1044c9895..55ff1d26fcea 100644 > > > > > > > > > --- a/drivers/pci/pcie/aspm.c > > > > > > > > > +++ b/drivers/pci/pcie/aspm.c > > > > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, > > > > > > > > > bool > > > > > > > > > locked) > > > > > > > > > { > > > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > > > > > > > if (!link) > > > > > > > > > return -EINVAL; > > > > > > > > > + > > > > > > > > > + parent = link->pdev; > > > > > > > > > + child = link->downstream; > > > > > > > > > + > > > > > > > > > /* > > > > > > > > > * A driver requested that ASPM be enabled on this device, but > > > > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct > > > > > > > > > pci_dev > > > > > > > > > *pdev, int state, bool locked) > > > > > > > > > if (!locked) > > > > > > > > > down_read(&pci_bus_sem); > > > > > > > > > mutex_lock(&aspm_lock); > > > > > > > > > + /* > > > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > > > > > > > > T_POWER_ON and > > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > > > > > > > > bits for > > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > > > + */ > > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > > > > > > > > > > > > > I still don't think this is the place to recalculate the L1.2 parameters > > > > > > > especially when know the calculation was done but was cleared by > > > > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after > > > > > > > pci_bus_reset() in vmd.c? > > > > > > > > > > > > I have not thought pci_save/restore_state() around pci_bus_reset() > > > > > > before. It is an interesting direction. > > > > > > > > > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4] > > > > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates", > > > > > > too. Then, both the PCIe bridge and the PCIe device have the same > > > > > > LTR_L1.2_THRESHOLD 101376ns as expected. > > > > > > > > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > > > > > index bbf4a47e7b31..6b8dd4f30127 100644 > > > > > > --- a/drivers/pci/controller/vmd.c > > > > > > +++ b/drivers/pci/controller/vmd.c > > > > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct > > > > > > pci_host_bridge *root_bridge, > > > > > > vmd_bridge->native_dpc = root_bridge->native_dpc; > > > > > > } > > > > > > > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata) > > > > > > +{ > > > > > > + pci_save_state(pdev); > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata) > > > > > > +{ > > > > > > + pci_restore_state(pdev); > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > /* > > > > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS. > > > > > > */ > > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > > > unsigned long features) > > > > > > pci_scan_child_bus(vmd->bus); > > > > > > vmd_domain_reset(vmd); > > > > > > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > > > > /* When Intel VMD is enabled, the OS does not discover the Root > > > > > > Ports > > > > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus() > > > > > > applies > > > > > > * a reset to the parent of the PCI device supplied as argument. > > > > > > This > > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > > > unsigned long features) > > > > > > break; > > > > > > } > > > > > > } > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > > > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > > > > vmd_enable_domain() which preserves state unlike pci_reset_bus()? > > > > > > > > > > (Don't tell me naming of these functions is a horrible mess. :-/) > > > > > > > > Hmm. So this *is* calling pci_reset_bus(). > > > > > > Yeah, I managed to get confused by the names myself, I somehow > > > ended up thinking it calls pci_bus_reset() which is not correct... > > > > > > > L1.2 configuration has specific > > > > ordering requirements for changes to parent & child devices. Could be why it's > > > > not getting restored properly. > > > > > > Indeed, it has to be something else since the patch above doesn't even > > > restore anything because dev->state_saved should get set to false by the > > > first pci_restore_state() called from > > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I > > > think!? > > > > Inspired by Ilpo's comment. I add some debug messages based on > > linux-next's tag 'next-20240809' to understand the code path of > > pci_reset_bus(): > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index ffaaca0978cb..3ee71374f1de 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > > * races with ->remove() by the device lock, which must be held by > > * the caller. > > */ > > - if (err_handler && err_handler->reset_prepare) > > + if (err_handler && err_handler->reset_prepare) { > > + pci_info(dev, "%s: %pF\n", __func__, > > err_handler->reset_prepare); > > err_handler->reset_prepare(dev); > > + } > > > > /* > > * Wake-up device prior to save. PM registers default to D0 after > > @@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > > pci_set_power_state(dev, PCI_D0); > > > > pci_save_state(dev); > > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__, > > dev->state_saved ? "true" : "false"); > > /* > > * Disable the device by clearing the Command register, except for > > * INTx-disable which is set. This not only disables MMIO and I/O port > > @@ -5655,6 +5658,10 @@ static void > > pci_bus_save_and_disable_locked(struct pci_bus *bus) > > struct pci_dev *dev; > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", > > + __func__, > > + dev->state_saved ? "true" : "false", > > + dev->subordinate ? "has" : "does not have"); > > pci_dev_save_and_disable(dev); > > if (dev->subordinate) > > pci_bus_save_and_disable_locked(dev->subordinate); > > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus) > > struct pci_dev *dev; > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", > > + __func__, > > + dev->state_saved ? "true" : "false", > > + dev->subordinate ? "has" : "does not have"); > > pci_dev_restore(dev); > > if (dev->subordinate) > > pci_bus_restore_locked(dev->subordinate); > > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe) > > if (!bus->self || !pci_bus_resettable(bus)) > > return -ENOTTY; > > > > - if (probe) > > + if (probe) { > > + pci_info(bus->self, "%s: probe is true. So return 0 > > directly", __func__); > > return 0; > > + } > > > > pci_bus_lock(bus); > > > > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus) > > int rc; > > > > rc = pci_bus_reset(bus, PCI_RESET_PROBE); > > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc); > > if (rc) > > return rc; > > > > if (pci_bus_trylock(bus)) { > > + pci_info(bus->self, "%s: pci_bus_trylock() returns > > true\n", __func__); > > pci_bus_save_and_disable_locked(bus); > > might_sleep(); > > rc = pci_bridge_secondary_bus_reset(bus->self); > > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus) > > */ > > int pci_reset_bus(struct pci_dev *pdev) > > { > > + pci_info(pdev, "%s: %s", __func__, > > !pci_probe_reset_slot(pdev->slot) ? "true" : "false"); > > return (!pci_probe_reset_slot(pdev->slot)) ? > > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); > > } > > > > And, have the information of VMD PCIe devices with the built kernel: > > > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > > decode]) > > ... > > Capabilities: [200 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=0ns > > L1SubCtl2: T_PwrOn=0us > > > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > > ... > > Capabilities: [900 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=101376ns > > L1SubCtl2: T_PwrOn=50us > > > > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the > > PCIe bridge has LTR1.2_Threshold=0ns. > > This is now the other way around as in the original posting that had > 0ns for 10000:e1:00.0 ?? > > Is this behavior even consistent or did you e.g. mess up some copy > pasting somewhere? The original posting came with older kernel 6.5. It shows: 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal decode]) ... Capabilities: [200 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us ... 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) ... Capabilities: [900 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us ... Full information: https://gist.github.com/starnight/e19487a44efefff477f9ac9ed641c183 But, newer kernel, for example linux-next next-20240809 and next-20240820 which I have tried shows: 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal decode]) ... Capabilities: [200 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=0us ... 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) ... Capabilities: [900 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us ... Full information: https://gist.github.com/starnight/081ea4adbce40a27faf234e5e135b49a So, according to the information above, different kernel versions show different L1 sub-states. Jian-Hong Pan
On Mon, 12 Aug 2024 16:18:22 +0800 Jian-Hong Pan <jhp@endlessos.org> wrote: > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 > 下午5:49寫道: > > > > On Wed, 7 Aug 2024, David E. Box wrote: > > > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 > > > > > 週二 上午4:26寫道: > > > > > > > > > > > > Hi Jian-Hong, > > > > > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 > > > > > > > 下午4:04寫道: > > > > > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > > __pci_enable_link_state(), > > > > > > > > it configs the link directly without ensuring related > > > > > > > > L1.2 parameters, such > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and > > > > > > > > LTR_L1.2_THRESHOLD have been > > > > > > > > programmed. > > > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and > > > > > > > > child device gets > > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled > > > > > > > > VMD: > > > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen > > > > > > > > Core Processor PCIe > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > > ... > > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ > > > > > > > > ASPM_L1.1+ L1_PM_Substates+ > > > > > > > > PortCommonModeRestoreTime=45us > > > > > > > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us > > > > > > > > LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk > > > > > > > > Corp WD Blue SN550 > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > > ... > > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ > > > > > > > > ASPM_L1.1- L1_PM_Substates+ > > > > > > > > PortCommonModeRestoreTime=32us > > > > > > > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us > > > > > > > > LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling > > > > > > > > ASPM L1.2 on the PCIe > > > > > > > > Root Port and the child NVMe, they should be programmed > > > > > > > > with the same LTR1.2_Threshold value. However, they > > > > > > > > have different values in this case. > > > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 > > > > > > > > parameters properly before > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > > > > --- > > > > > > > > v2: > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 > > > > > > > > Substates > > > > > > > > > > > > > > > > v3: > > > > > > > > - Only enable supported features for the L1 Substates > > > > > > > > part > > > > > > > > > > > > > > > > v4: > > > > > > > > - Focus on fixing L1.2 parameters, instead of > > > > > > > > re-initializing whole L1SS > > > > > > > > > > > > > > > > v5: > > > > > > > > - Fix typo and commit message > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: > > > > > > > > Introduce aspm_get_l1ss_cap()" > > > > > > > > > > > > > > > > v6: > > > > > > > > - Skipped > > > > > > > > > > > > > > > > v7: > > > > > > > > - Pick back and rebase on the new version kernel > > > > > > > > - Drop the link state flag check. And, always config > > > > > > > > link state's timing > > > > > > > > parameters > > > > > > > > > > > > > > > > v8: > > > > > > > > - Because pcie_aspm_get_link() might return the link as > > > > > > > > NULL, move getting the link's parent and child devices > > > > > > > > after check the link is not NULL. This avoids NULL > > > > > > > > memory access. > > > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c > > > > > > > > b/drivers/pci/pcie/aspm.c index > > > > > > > > 5db1044c9895..55ff1d26fcea 100644 --- > > > > > > > > a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c > > > > > > > > @@ -1411,9 +1411,15 @@ > > > > > > > > EXPORT_SYMBOL(pci_disable_link_state); static int > > > > > > > > __pci_enable_link_state(struct pci_dev *pdev, int > > > > > > > > state, bool locked) > > > > > > > > { > > > > > > > > struct pcie_link_state *link = > > > > > > > > pcie_aspm_get_link(pdev); > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > > > > > if (!link) > > > > > > > > return -EINVAL; > > > > > > > > + > > > > > > > > + parent = link->pdev; > > > > > > > > + child = link->downstream; > > > > > > > > + > > > > > > > > /* > > > > > > > > * A driver requested that ASPM be enabled on > > > > > > > > this device, but > > > > > > > > * if we don't have permission to manage ASPM > > > > > > > > (e.g., on ACPI @@ -1428,6 +1434,15 @@ static int > > > > > > > > __pci_enable_link_state(struct pci_dev > > > > > > > > *pdev, int state, bool locked) > > > > > > > > if (!locked) > > > > > > > > down_read(&pci_bus_sem); > > > > > > > > mutex_lock(&aspm_lock); > > > > > > > > + /* > > > > > > > > + * Ensure L1.2 parameters: > > > > > > > > Common_Mode_Restore_Times, T_POWER_ON and > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly > > > > > > > > before enable bits for > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > > + */ > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, > > > > > > > > child_l1ss_cap); > > > > > > > > > > > > I still don't think this is the place to recalculate the > > > > > > L1.2 parameters especially when know the calculation was > > > > > > done but was cleared by pci_bus_reset(). Can't we just do a > > > > > > pci_save/restore_state() before/after pci_bus_reset() in > > > > > > vmd.c? > > > > > > > > > > I have not thought pci_save/restore_state() around > > > > > pci_bus_reset() before. It is an interesting direction. > > > > > > > > > > So, I prepare modification below for test. Include "[PATCH > > > > > v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI > > > > > PM's L1 substates", too. Then, both the PCIe bridge and the > > > > > PCIe device have the same LTR_L1.2_THRESHOLD 101376ns as > > > > > expected. > > > > > > > > > > diff --git a/drivers/pci/controller/vmd.c > > > > > b/drivers/pci/controller/vmd.c index > > > > > bbf4a47e7b31..6b8dd4f30127 100644 --- > > > > > a/drivers/pci/controller/vmd.c +++ > > > > > b/drivers/pci/controller/vmd.c @@ -727,6 +727,18 @@ static > > > > > void vmd_copy_host_bridge_flags(struct pci_host_bridge > > > > > *root_bridge, vmd_bridge->native_dpc = > > > > > root_bridge->native_dpc; } > > > > > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void > > > > > *userdata) +{ > > > > > + pci_save_state(pdev); > > > > > + return 0; > > > > > +} > > > > > + > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void > > > > > *userdata) +{ > > > > > + pci_restore_state(pdev); > > > > > + return 0; > > > > > +} > > > > > + > > > > > /* > > > > > * Enable ASPM and LTR settings on devices that aren't > > > > > configured by BIOS. */ > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct > > > > > vmd_dev *vmd, unsigned long features) > > > > > pci_scan_child_bus(vmd->bus); > > > > > vmd_domain_reset(vmd); > > > > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > > > /* When Intel VMD is enabled, the OS does not > > > > > discover the Root Ports > > > > > * owned by Intel VMD within the MMCFG space. > > > > > pci_reset_bus() applies > > > > > * a reset to the parent of the PCI device supplied > > > > > as argument. This > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct > > > > > vmd_dev *vmd, unsigned long features) > > > > > break; > > > > > } > > > > > } > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > > > vmd_enable_domain() which preserves state unlike > > > > pci_reset_bus()? > > > > > > > > (Don't tell me naming of these functions is a horrible mess. > > > > :-/) > > > > > > Hmm. So this *is* calling pci_reset_bus(). > > > > Yeah, I managed to get confused by the names myself, I somehow > > ended up thinking it calls pci_bus_reset() which is not correct... > > > > > L1.2 configuration has specific > > > ordering requirements for changes to parent & child devices. > > > Could be why it's not getting restored properly. > > > > Indeed, it has to be something else since the patch above doesn't > > even restore anything because dev->state_saved should get set to > > false by the first pci_restore_state() called from > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), > > I think!? > > Inspired by Ilpo's comment. I add some debug messages based on > linux-next's tag 'next-20240809' to understand the code path of > pci_reset_bus(): > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index ffaaca0978cb..3ee71374f1de 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct > pci_dev *dev) > * races with ->remove() by the device lock, which must be > held by > * the caller. > */ > - if (err_handler && err_handler->reset_prepare) > + if (err_handler && err_handler->reset_prepare) { > + pci_info(dev, "%s: %pF\n", __func__, > err_handler->reset_prepare); > err_handler->reset_prepare(dev); > + } > > /* > * Wake-up device prior to save. PM registers default to D0 > after @@ -5144,6 +5146,7 @@ static void > pci_dev_save_and_disable(struct pci_dev *dev) > pci_set_power_state(dev, PCI_D0); > > pci_save_state(dev); > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__, > dev->state_saved ? "true" : "false"); > /* > * Disable the device by clearing the Command register, > except for > * INTx-disable which is set. This not only disables MMIO > and I/O port @@ -5655,6 +5658,10 @@ static void > pci_bus_save_and_disable_locked(struct pci_bus *bus) > struct pci_dev *dev; > > list_for_each_entry(dev, &bus->devices, bus_list) { > + pci_info(dev, "%s: PCI state_saved is %s, and %s > subordinate\n", > + __func__, > + dev->state_saved ? "true" : "false", > + dev->subordinate ? "has" : "does not have"); > pci_dev_save_and_disable(dev); > if (dev->subordinate) > pci_bus_save_and_disable_locked(dev->subordinate); > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct > pci_bus *bus) struct pci_dev *dev; > > list_for_each_entry(dev, &bus->devices, bus_list) { > + pci_info(dev, "%s: PCI state_saved is %s, and %s > subordinate\n", > + __func__, > + dev->state_saved ? "true" : "false", > + dev->subordinate ? "has" : "does not have"); > pci_dev_restore(dev); > if (dev->subordinate) > pci_bus_restore_locked(dev->subordinate); > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, > bool probe) if (!bus->self || !pci_bus_resettable(bus)) > return -ENOTTY; > > - if (probe) > + if (probe) { > + pci_info(bus->self, "%s: probe is true. So return 0 > directly", __func__); > return 0; > + } > > pci_bus_lock(bus); > > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus > *bus) int rc; > > rc = pci_bus_reset(bus, PCI_RESET_PROBE); > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", > __func__, rc); if (rc) > return rc; > > if (pci_bus_trylock(bus)) { > + pci_info(bus->self, "%s: pci_bus_trylock() returns > true\n", __func__); > pci_bus_save_and_disable_locked(bus); > might_sleep(); > rc = pci_bridge_secondary_bus_reset(bus->self); > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus) > */ > int pci_reset_bus(struct pci_dev *pdev) > { > + pci_info(pdev, "%s: %s", __func__, > !pci_probe_reset_slot(pdev->slot) ? "true" : "false"); > return (!pci_probe_reset_slot(pdev->slot)) ? > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); > } > > And, have the information of VMD PCIe devices with the built kernel: > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > decode]) > ... > Capabilities: [200 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=0us LTR1.2_Threshold=0ns > L1SubCtl2: T_PwrOn=0us > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > ... > Capabilities: [900 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=0us LTR1.2_Threshold=101376ns > L1SubCtl2: T_PwrOn=50us > > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the > PCIe bridge has LTR1.2_Threshold=0ns. > > Then, check the dmesg. I notice the debug messages: > > pci 10000:e0:06.0: PCI bridge to [bus e1] > pci 10000:e0:06.0: Primary bus is hard wired to 0 > pci 10000:e1:00.0: pci_reset_bus: false > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0 > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is > false, and does not have subordinate > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true > Freeing initrd memory: 75236K > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true, > and does not have subordinate > > So, the code path is: > > vmd_enable_domain() > pci_reset_bus() > __pci_reset_bus() > pci_bus_reset() > pci_bus_save_and_disable_locked() > pci_dev_save_and_disable() > pci_bus_restore_locked() > pci_dev_restore() > > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not. > So, PCIe bridge 10000:e0:06.0 does not restore state correctly. > > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's > bus in vmd_enable_domain(). > * Bus "e1" has only NVMe 10000:e1:00.0 > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe > 10000:e1:00.0 and SATA Controller 10000:e0:17.0. > > Here is the PCI tree: > > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04 > | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD > Graphics G4] | +-04.0 Intel Corporation TigerLake-LP > Dynamic Tuning Processor Participant > | +-06.0 Intel Corporation RST VMD Managed Controller > | +-07.0-[01-2b]-- > | +-08.0 Intel Corporation GNA Scoring Accelerator module > | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator > Driver | +-0d.0 Intel Corporation Tiger Lake-LP > Thunderbolt 4 USB Controller > | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 > NHI #0 | +-0e.0 Intel Corporation Volume Management Device > NVMe RAID Controller > | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1 > xHCI Host Controller > | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM > | +-14.3 Intel Corporation Wi-Fi 6 AX201 > | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C > Controller #0 | +-15.1 Intel Corporation Tiger Lake-LP > Serial IO I2C Controller #1 | +-16.0 Intel Corporation > Tiger Lake-LP Management Engine Interface | +-17.0 Intel > Corporation RST VMD Managed Controller | +-1f.0 Intel > Corporation Tiger Lake-LP LPC Controller | +-1f.3 Intel > Corporation Tiger Lake-LP Smart Sound Technology Audio Controller > | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller > | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller > | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller > > According the findings above, to ensure the devices on the VMD bus > have correctly states, seems pci_save_state() all the devices before > pci_reset_bus(), and pci_restore_state() all the devices after > pci_reset_bus() is the correct answer. What happens if you call pci_reset_bus with PCIe bridge 10000:e0:06.0 instead of NVMe 10000:e1:00.0? I believe the current implementation in vmd_enable_domain finds first child device on each rootport and calls pci_reset_bus with NVMe. -nirmal > > Jian-Hong Pan
Nirmal Patel <nirmal.patel@linux.intel.com> 於 2024年9月3日 週二 下午11:17寫道: > > On Mon, 12 Aug 2024 16:18:22 +0800 > Jian-Hong Pan <jhp@endlessos.org> wrote: > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 > > 下午5:49寫道: > > > > > > On Wed, 7 Aug 2024, David E. Box wrote: > > > > > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > > > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 > > > > > > 週二 上午4:26寫道: > > > > > > > > > > > > > > Hi Jian-Hong, > > > > > > > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 > > > > > > > > 下午4:04寫道: > > > > > > > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > > > __pci_enable_link_state(), > > > > > > > > > it configs the link directly without ensuring related > > > > > > > > > L1.2 parameters, such > > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and > > > > > > > > > LTR_L1.2_THRESHOLD have been > > > > > > > > > programmed. > > > > > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and > > > > > > > > > child device gets > > > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled > > > > > > > > > VMD: > > > > > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen > > > > > > > > > Core Processor PCIe > > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > > > ... > > > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ > > > > > > > > > ASPM_L1.1+ L1_PM_Substates+ > > > > > > > > > PortCommonModeRestoreTime=45us > > > > > > > > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- > > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us > > > > > > > > > LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk > > > > > > > > > Corp WD Blue SN550 > > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > > > ... > > > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ > > > > > > > > > ASPM_L1.1- L1_PM_Substates+ > > > > > > > > > PortCommonModeRestoreTime=32us > > > > > > > > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- > > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us > > > > > > > > > LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling > > > > > > > > > ASPM L1.2 on the PCIe > > > > > > > > > Root Port and the child NVMe, they should be programmed > > > > > > > > > with the same LTR1.2_Threshold value. However, they > > > > > > > > > have different values in this case. > > > > > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 > > > > > > > > > parameters properly before > > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > > > > > --- > > > > > > > > > v2: > > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 > > > > > > > > > Substates > > > > > > > > > > > > > > > > > > v3: > > > > > > > > > - Only enable supported features for the L1 Substates > > > > > > > > > part > > > > > > > > > > > > > > > > > > v4: > > > > > > > > > - Focus on fixing L1.2 parameters, instead of > > > > > > > > > re-initializing whole L1SS > > > > > > > > > > > > > > > > > > v5: > > > > > > > > > - Fix typo and commit message > > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: > > > > > > > > > Introduce aspm_get_l1ss_cap()" > > > > > > > > > > > > > > > > > > v6: > > > > > > > > > - Skipped > > > > > > > > > > > > > > > > > > v7: > > > > > > > > > - Pick back and rebase on the new version kernel > > > > > > > > > - Drop the link state flag check. And, always config > > > > > > > > > link state's timing > > > > > > > > > parameters > > > > > > > > > > > > > > > > > > v8: > > > > > > > > > - Because pcie_aspm_get_link() might return the link as > > > > > > > > > NULL, move getting the link's parent and child devices > > > > > > > > > after check the link is not NULL. This avoids NULL > > > > > > > > > memory access. > > > > > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c > > > > > > > > > b/drivers/pci/pcie/aspm.c index > > > > > > > > > 5db1044c9895..55ff1d26fcea 100644 --- > > > > > > > > > a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c > > > > > > > > > @@ -1411,9 +1411,15 @@ > > > > > > > > > EXPORT_SYMBOL(pci_disable_link_state); static int > > > > > > > > > __pci_enable_link_state(struct pci_dev *pdev, int > > > > > > > > > state, bool locked) > > > > > > > > > { > > > > > > > > > struct pcie_link_state *link = > > > > > > > > > pcie_aspm_get_link(pdev); > > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > > > > > > > if (!link) > > > > > > > > > return -EINVAL; > > > > > > > > > + > > > > > > > > > + parent = link->pdev; > > > > > > > > > + child = link->downstream; > > > > > > > > > + > > > > > > > > > /* > > > > > > > > > * A driver requested that ASPM be enabled on > > > > > > > > > this device, but > > > > > > > > > * if we don't have permission to manage ASPM > > > > > > > > > (e.g., on ACPI @@ -1428,6 +1434,15 @@ static int > > > > > > > > > __pci_enable_link_state(struct pci_dev > > > > > > > > > *pdev, int state, bool locked) > > > > > > > > > if (!locked) > > > > > > > > > down_read(&pci_bus_sem); > > > > > > > > > mutex_lock(&aspm_lock); > > > > > > > > > + /* > > > > > > > > > + * Ensure L1.2 parameters: > > > > > > > > > Common_Mode_Restore_Times, T_POWER_ON and > > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly > > > > > > > > > before enable bits for > > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > > > + */ > > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, > > > > > > > > > child_l1ss_cap); > > > > > > > > > > > > > > I still don't think this is the place to recalculate the > > > > > > > L1.2 parameters especially when know the calculation was > > > > > > > done but was cleared by pci_bus_reset(). Can't we just do a > > > > > > > pci_save/restore_state() before/after pci_bus_reset() in > > > > > > > vmd.c? > > > > > > > > > > > > I have not thought pci_save/restore_state() around > > > > > > pci_bus_reset() before. It is an interesting direction. > > > > > > > > > > > > So, I prepare modification below for test. Include "[PATCH > > > > > > v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI > > > > > > PM's L1 substates", too. Then, both the PCIe bridge and the > > > > > > PCIe device have the same LTR_L1.2_THRESHOLD 101376ns as > > > > > > expected. > > > > > > > > > > > > diff --git a/drivers/pci/controller/vmd.c > > > > > > b/drivers/pci/controller/vmd.c index > > > > > > bbf4a47e7b31..6b8dd4f30127 100644 --- > > > > > > a/drivers/pci/controller/vmd.c +++ > > > > > > b/drivers/pci/controller/vmd.c @@ -727,6 +727,18 @@ static > > > > > > void vmd_copy_host_bridge_flags(struct pci_host_bridge > > > > > > *root_bridge, vmd_bridge->native_dpc = > > > > > > root_bridge->native_dpc; } > > > > > > > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void > > > > > > *userdata) +{ > > > > > > + pci_save_state(pdev); > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void > > > > > > *userdata) +{ > > > > > > + pci_restore_state(pdev); > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > /* > > > > > > * Enable ASPM and LTR settings on devices that aren't > > > > > > configured by BIOS. */ > > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct > > > > > > vmd_dev *vmd, unsigned long features) > > > > > > pci_scan_child_bus(vmd->bus); > > > > > > vmd_domain_reset(vmd); > > > > > > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > > > > /* When Intel VMD is enabled, the OS does not > > > > > > discover the Root Ports > > > > > > * owned by Intel VMD within the MMCFG space. > > > > > > pci_reset_bus() applies > > > > > > * a reset to the parent of the PCI device supplied > > > > > > as argument. This > > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct > > > > > > vmd_dev *vmd, unsigned long features) > > > > > > break; > > > > > > } > > > > > > } > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > > > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > > > > vmd_enable_domain() which preserves state unlike > > > > > pci_reset_bus()? > > > > > > > > > > (Don't tell me naming of these functions is a horrible mess. > > > > > :-/) > > > > > > > > Hmm. So this *is* calling pci_reset_bus(). > > > > > > Yeah, I managed to get confused by the names myself, I somehow > > > ended up thinking it calls pci_bus_reset() which is not correct... > > > > > > > L1.2 configuration has specific > > > > ordering requirements for changes to parent & child devices. > > > > Could be why it's not getting restored properly. > > > > > > Indeed, it has to be something else since the patch above doesn't > > > even restore anything because dev->state_saved should get set to > > > false by the first pci_restore_state() called from > > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), > > > I think!? > > > > Inspired by Ilpo's comment. I add some debug messages based on > > linux-next's tag 'next-20240809' to understand the code path of > > pci_reset_bus(): > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index ffaaca0978cb..3ee71374f1de 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct > > pci_dev *dev) > > * races with ->remove() by the device lock, which must be > > held by > > * the caller. > > */ > > - if (err_handler && err_handler->reset_prepare) > > + if (err_handler && err_handler->reset_prepare) { > > + pci_info(dev, "%s: %pF\n", __func__, > > err_handler->reset_prepare); > > err_handler->reset_prepare(dev); > > + } > > > > /* > > * Wake-up device prior to save. PM registers default to D0 > > after @@ -5144,6 +5146,7 @@ static void > > pci_dev_save_and_disable(struct pci_dev *dev) > > pci_set_power_state(dev, PCI_D0); > > > > pci_save_state(dev); > > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__, > > dev->state_saved ? "true" : "false"); > > /* > > * Disable the device by clearing the Command register, > > except for > > * INTx-disable which is set. This not only disables MMIO > > and I/O port @@ -5655,6 +5658,10 @@ static void > > pci_bus_save_and_disable_locked(struct pci_bus *bus) > > struct pci_dev *dev; > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > + pci_info(dev, "%s: PCI state_saved is %s, and %s > > subordinate\n", > > + __func__, > > + dev->state_saved ? "true" : "false", > > + dev->subordinate ? "has" : "does not have"); > > pci_dev_save_and_disable(dev); > > if (dev->subordinate) > > pci_bus_save_and_disable_locked(dev->subordinate); > > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct > > pci_bus *bus) struct pci_dev *dev; > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > + pci_info(dev, "%s: PCI state_saved is %s, and %s > > subordinate\n", > > + __func__, > > + dev->state_saved ? "true" : "false", > > + dev->subordinate ? "has" : "does not have"); > > pci_dev_restore(dev); > > if (dev->subordinate) > > pci_bus_restore_locked(dev->subordinate); > > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, > > bool probe) if (!bus->self || !pci_bus_resettable(bus)) > > return -ENOTTY; > > > > - if (probe) > > + if (probe) { > > + pci_info(bus->self, "%s: probe is true. So return 0 > > directly", __func__); > > return 0; > > + } > > > > pci_bus_lock(bus); > > > > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus > > *bus) int rc; > > > > rc = pci_bus_reset(bus, PCI_RESET_PROBE); > > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", > > __func__, rc); if (rc) > > return rc; > > > > if (pci_bus_trylock(bus)) { > > + pci_info(bus->self, "%s: pci_bus_trylock() returns > > true\n", __func__); > > pci_bus_save_and_disable_locked(bus); > > might_sleep(); > > rc = pci_bridge_secondary_bus_reset(bus->self); > > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus) > > */ > > int pci_reset_bus(struct pci_dev *pdev) > > { > > + pci_info(pdev, "%s: %s", __func__, > > !pci_probe_reset_slot(pdev->slot) ? "true" : "false"); > > return (!pci_probe_reset_slot(pdev->slot)) ? > > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); > > } > > > > And, have the information of VMD PCIe devices with the built kernel: > > > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > > decode]) > > ... > > Capabilities: [200 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=0ns > > L1SubCtl2: T_PwrOn=0us > > > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > > ... > > Capabilities: [900 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=101376ns > > L1SubCtl2: T_PwrOn=50us > > > > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the > > PCIe bridge has LTR1.2_Threshold=0ns. > > > > Then, check the dmesg. I notice the debug messages: > > > > pci 10000:e0:06.0: PCI bridge to [bus e1] > > pci 10000:e0:06.0: Primary bus is hard wired to 0 > > pci 10000:e1:00.0: pci_reset_bus: false > > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly > > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0 > > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true > > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is > > false, and does not have subordinate > > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true > > Freeing initrd memory: 75236K > > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true, > > and does not have subordinate > > > > So, the code path is: > > > > vmd_enable_domain() > > pci_reset_bus() > > __pci_reset_bus() > > pci_bus_reset() > > pci_bus_save_and_disable_locked() > > pci_dev_save_and_disable() > > pci_bus_restore_locked() > > pci_dev_restore() > > > > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does > > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not. > > So, PCIe bridge 10000:e0:06.0 does not restore state correctly. > > > > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's > > bus in vmd_enable_domain(). > > * Bus "e1" has only NVMe 10000:e1:00.0 > > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe > > 10000:e1:00.0 and SATA Controller 10000:e0:17.0. > > > > Here is the PCI tree: > > > > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04 > > | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD > > Graphics G4] | +-04.0 Intel Corporation TigerLake-LP > > Dynamic Tuning Processor Participant > > | +-06.0 Intel Corporation RST VMD Managed Controller > > | +-07.0-[01-2b]-- > > | +-08.0 Intel Corporation GNA Scoring Accelerator module > > | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator > > Driver | +-0d.0 Intel Corporation Tiger Lake-LP > > Thunderbolt 4 USB Controller > > | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 > > NHI #0 | +-0e.0 Intel Corporation Volume Management Device > > NVMe RAID Controller > > | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1 > > xHCI Host Controller > > | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM > > | +-14.3 Intel Corporation Wi-Fi 6 AX201 > > | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C > > Controller #0 | +-15.1 Intel Corporation Tiger Lake-LP > > Serial IO I2C Controller #1 | +-16.0 Intel Corporation > > Tiger Lake-LP Management Engine Interface | +-17.0 Intel > > Corporation RST VMD Managed Controller | +-1f.0 Intel > > Corporation Tiger Lake-LP LPC Controller | +-1f.3 Intel > > Corporation Tiger Lake-LP Smart Sound Technology Audio Controller > > | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller > > | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller > > | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V > > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD > > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller > > > > According the findings above, to ensure the devices on the VMD bus > > have correctly states, seems pci_save_state() all the devices before > > pci_reset_bus(), and pci_restore_state() all the devices after > > pci_reset_bus() is the correct answer. > What happens if you call pci_reset_bus with PCIe bridge 10000:e0:06.0 > instead of NVMe 10000:e1:00.0? I believe the current implementation in > vmd_enable_domain finds first child device on each rootport and calls > pci_reset_bus with NVMe. If call pci_reset_bus with PCIe bridge 10000:e0:06.0 instead of NVMe 10000:e1:00.0, it will show warning message "pci 10000:e0:06.0 can't reset device: -25". Jian-Hong Pan
On Mon, 12 Aug 2024 16:18:22 +0800 Jian-Hong Pan <jhp@endlessos.org> wrote: > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 > 下午5:49寫道: > > > > On Wed, 7 Aug 2024, David E. Box wrote: > > > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 > > > > > 週二 上午4:26寫道: > > > > > > > > > > > > Hi Jian-Hong, > > > > > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 > > > > > > > 下午4:04寫道: > > > > > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > > __pci_enable_link_state(), > > > > > > > > it configs the link directly without ensuring related > > > > > > > > L1.2 parameters, such > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and > > > > > > > > LTR_L1.2_THRESHOLD have been > > > > > > > > programmed. > > > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and > > > > > > > > child device gets > > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled > > > > > > > > VMD: > > > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen > > > > > > > > Core Processor PCIe > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > > ... > > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ > > > > > > > > ASPM_L1.1+ L1_PM_Substates+ > > > > > > > > PortCommonModeRestoreTime=45us > > > > > > > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us > > > > > > > > LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk > > > > > > > > Corp WD Blue SN550 > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > > ... > > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ > > > > > > > > ASPM_L1.1- L1_PM_Substates+ > > > > > > > > PortCommonModeRestoreTime=32us > > > > > > > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us > > > > > > > > LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling > > > > > > > > ASPM L1.2 on the PCIe > > > > > > > > Root Port and the child NVMe, they should be programmed > > > > > > > > with the same LTR1.2_Threshold value. However, they > > > > > > > > have different values in this case. > > > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 > > > > > > > > parameters properly before > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > > > > --- > > > > > > > > v2: > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 > > > > > > > > Substates > > > > > > > > > > > > > > > > v3: > > > > > > > > - Only enable supported features for the L1 Substates > > > > > > > > part > > > > > > > > > > > > > > > > v4: > > > > > > > > - Focus on fixing L1.2 parameters, instead of > > > > > > > > re-initializing whole L1SS > > > > > > > > > > > > > > > > v5: > > > > > > > > - Fix typo and commit message > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: > > > > > > > > Introduce aspm_get_l1ss_cap()" > > > > > > > > > > > > > > > > v6: > > > > > > > > - Skipped > > > > > > > > > > > > > > > > v7: > > > > > > > > - Pick back and rebase on the new version kernel > > > > > > > > - Drop the link state flag check. And, always config > > > > > > > > link state's timing > > > > > > > > parameters > > > > > > > > > > > > > > > > v8: > > > > > > > > - Because pcie_aspm_get_link() might return the link as > > > > > > > > NULL, move getting the link's parent and child devices > > > > > > > > after check the link is not NULL. This avoids NULL > > > > > > > > memory access. > > > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c > > > > > > > > b/drivers/pci/pcie/aspm.c index > > > > > > > > 5db1044c9895..55ff1d26fcea 100644 --- > > > > > > > > a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c > > > > > > > > @@ -1411,9 +1411,15 @@ > > > > > > > > EXPORT_SYMBOL(pci_disable_link_state); static int > > > > > > > > __pci_enable_link_state(struct pci_dev *pdev, int > > > > > > > > state, bool locked) > > > > > > > > { > > > > > > > > struct pcie_link_state *link = > > > > > > > > pcie_aspm_get_link(pdev); > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > > > > > if (!link) > > > > > > > > return -EINVAL; > > > > > > > > + > > > > > > > > + parent = link->pdev; > > > > > > > > + child = link->downstream; > > > > > > > > + > > > > > > > > /* > > > > > > > > * A driver requested that ASPM be enabled on > > > > > > > > this device, but > > > > > > > > * if we don't have permission to manage ASPM > > > > > > > > (e.g., on ACPI @@ -1428,6 +1434,15 @@ static int > > > > > > > > __pci_enable_link_state(struct pci_dev > > > > > > > > *pdev, int state, bool locked) > > > > > > > > if (!locked) > > > > > > > > down_read(&pci_bus_sem); > > > > > > > > mutex_lock(&aspm_lock); > > > > > > > > + /* > > > > > > > > + * Ensure L1.2 parameters: > > > > > > > > Common_Mode_Restore_Times, T_POWER_ON and > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly > > > > > > > > before enable bits for > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > > + */ > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, > > > > > > > > child_l1ss_cap); > > > > > > > > > > > > I still don't think this is the place to recalculate the > > > > > > L1.2 parameters especially when know the calculation was > > > > > > done but was cleared by pci_bus_reset(). Can't we just do a > > > > > > pci_save/restore_state() before/after pci_bus_reset() in > > > > > > vmd.c? > > > > > > > > > > I have not thought pci_save/restore_state() around > > > > > pci_bus_reset() before. It is an interesting direction. > > > > > > > > > > So, I prepare modification below for test. Include "[PATCH > > > > > v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI > > > > > PM's L1 substates", too. Then, both the PCIe bridge and the > > > > > PCIe device have the same LTR_L1.2_THRESHOLD 101376ns as > > > > > expected. > > > > > > > > > > diff --git a/drivers/pci/controller/vmd.c > > > > > b/drivers/pci/controller/vmd.c index > > > > > bbf4a47e7b31..6b8dd4f30127 100644 --- > > > > > a/drivers/pci/controller/vmd.c +++ > > > > > b/drivers/pci/controller/vmd.c @@ -727,6 +727,18 @@ static > > > > > void vmd_copy_host_bridge_flags(struct pci_host_bridge > > > > > *root_bridge, vmd_bridge->native_dpc = > > > > > root_bridge->native_dpc; } > > > > > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void > > > > > *userdata) +{ > > > > > + pci_save_state(pdev); > > > > > + return 0; > > > > > +} > > > > > + > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void > > > > > *userdata) +{ > > > > > + pci_restore_state(pdev); > > > > > + return 0; > > > > > +} > > > > > + > > > > > /* > > > > > * Enable ASPM and LTR settings on devices that aren't > > > > > configured by BIOS. */ > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct > > > > > vmd_dev *vmd, unsigned long features) > > > > > pci_scan_child_bus(vmd->bus); > > > > > vmd_domain_reset(vmd); > > > > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > > > /* When Intel VMD is enabled, the OS does not > > > > > discover the Root Ports > > > > > * owned by Intel VMD within the MMCFG space. > > > > > pci_reset_bus() applies > > > > > * a reset to the parent of the PCI device supplied > > > > > as argument. This > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct > > > > > vmd_dev *vmd, unsigned long features) > > > > > break; > > > > > } > > > > > } > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > > > vmd_enable_domain() which preserves state unlike > > > > pci_reset_bus()? > > > > > > > > (Don't tell me naming of these functions is a horrible mess. > > > > :-/) > > > > > > Hmm. So this *is* calling pci_reset_bus(). > > > > Yeah, I managed to get confused by the names myself, I somehow > > ended up thinking it calls pci_bus_reset() which is not correct... > > > > > L1.2 configuration has specific > > > ordering requirements for changes to parent & child devices. > > > Could be why it's not getting restored properly. > > > > Indeed, it has to be something else since the patch above doesn't > > even restore anything because dev->state_saved should get set to > > false by the first pci_restore_state() called from > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), > > I think!? > > Inspired by Ilpo's comment. I add some debug messages based on > linux-next's tag 'next-20240809' to understand the code path of > pci_reset_bus(): > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index ffaaca0978cb..3ee71374f1de 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct > pci_dev *dev) > * races with ->remove() by the device lock, which must be > held by > * the caller. > */ > - if (err_handler && err_handler->reset_prepare) > + if (err_handler && err_handler->reset_prepare) { > + pci_info(dev, "%s: %pF\n", __func__, > err_handler->reset_prepare); > err_handler->reset_prepare(dev); > + } > > /* > * Wake-up device prior to save. PM registers default to D0 > after @@ -5144,6 +5146,7 @@ static void > pci_dev_save_and_disable(struct pci_dev *dev) > pci_set_power_state(dev, PCI_D0); > > pci_save_state(dev); > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__, > dev->state_saved ? "true" : "false"); > /* > * Disable the device by clearing the Command register, > except for > * INTx-disable which is set. This not only disables MMIO > and I/O port @@ -5655,6 +5658,10 @@ static void > pci_bus_save_and_disable_locked(struct pci_bus *bus) > struct pci_dev *dev; > > list_for_each_entry(dev, &bus->devices, bus_list) { > + pci_info(dev, "%s: PCI state_saved is %s, and %s > subordinate\n", > + __func__, > + dev->state_saved ? "true" : "false", > + dev->subordinate ? "has" : "does not have"); > pci_dev_save_and_disable(dev); > if (dev->subordinate) > pci_bus_save_and_disable_locked(dev->subordinate); > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct > pci_bus *bus) struct pci_dev *dev; > > list_for_each_entry(dev, &bus->devices, bus_list) { > + pci_info(dev, "%s: PCI state_saved is %s, and %s > subordinate\n", > + __func__, > + dev->state_saved ? "true" : "false", > + dev->subordinate ? "has" : "does not have"); > pci_dev_restore(dev); > if (dev->subordinate) > pci_bus_restore_locked(dev->subordinate); > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, > bool probe) if (!bus->self || !pci_bus_resettable(bus)) > return -ENOTTY; > > - if (probe) > + if (probe) { > + pci_info(bus->self, "%s: probe is true. So return 0 > directly", __func__); > return 0; > + } > > pci_bus_lock(bus); > > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus > *bus) int rc; > > rc = pci_bus_reset(bus, PCI_RESET_PROBE); > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", > __func__, rc); if (rc) > return rc; > > if (pci_bus_trylock(bus)) { > + pci_info(bus->self, "%s: pci_bus_trylock() returns > true\n", __func__); > pci_bus_save_and_disable_locked(bus); > might_sleep(); > rc = pci_bridge_secondary_bus_reset(bus->self); > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus) > */ > int pci_reset_bus(struct pci_dev *pdev) > { > + pci_info(pdev, "%s: %s", __func__, > !pci_probe_reset_slot(pdev->slot) ? "true" : "false"); > return (!pci_probe_reset_slot(pdev->slot)) ? > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); > } > > And, have the information of VMD PCIe devices with the built kernel: > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > decode]) > ... > Capabilities: [200 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=0us LTR1.2_Threshold=0ns > L1SubCtl2: T_PwrOn=0us > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > ... > Capabilities: [900 v1] L1 PM Substates > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > T_CommonMode=0us LTR1.2_Threshold=101376ns > L1SubCtl2: T_PwrOn=50us > > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the > PCIe bridge has LTR1.2_Threshold=0ns. > > Then, check the dmesg. I notice the debug messages: > > pci 10000:e0:06.0: PCI bridge to [bus e1] > pci 10000:e0:06.0: Primary bus is hard wired to 0 > pci 10000:e1:00.0: pci_reset_bus: false > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0 > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is > false, and does not have subordinate > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true > Freeing initrd memory: 75236K > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true, > and does not have subordinate > > So, the code path is: > > vmd_enable_domain() > pci_reset_bus() > __pci_reset_bus() > pci_bus_reset() > pci_bus_save_and_disable_locked() > pci_dev_save_and_disable() > pci_bus_restore_locked() > pci_dev_restore() > > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not. > So, PCIe bridge 10000:e0:06.0 does not restore state correctly. > > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's > bus in vmd_enable_domain(). > * Bus "e1" has only NVMe 10000:e1:00.0 > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe > 10000:e1:00.0 and SATA Controller 10000:e0:17.0. > > Here is the PCI tree: > > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04 > | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD > Graphics G4] | +-04.0 Intel Corporation TigerLake-LP > Dynamic Tuning Processor Participant > | +-06.0 Intel Corporation RST VMD Managed Controller > | +-07.0-[01-2b]-- > | +-08.0 Intel Corporation GNA Scoring Accelerator module > | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator > Driver | +-0d.0 Intel Corporation Tiger Lake-LP > Thunderbolt 4 USB Controller > | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 > NHI #0 | +-0e.0 Intel Corporation Volume Management Device > NVMe RAID Controller > | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1 > xHCI Host Controller > | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM > | +-14.3 Intel Corporation Wi-Fi 6 AX201 > | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C > Controller #0 | +-15.1 Intel Corporation Tiger Lake-LP > Serial IO I2C Controller #1 | +-16.0 Intel Corporation > Tiger Lake-LP Management Engine Interface | +-17.0 Intel > Corporation RST VMD Managed Controller | +-1f.0 Intel > Corporation Tiger Lake-LP LPC Controller | +-1f.3 Intel > Corporation Tiger Lake-LP Smart Sound Technology Audio Controller > | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller > | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller > | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller > > According the findings above, to ensure the devices on the VMD bus > have correctly states, seems pci_save_state() all the devices before > pci_reset_bus(), and pci_restore_state() all the devices after > pci_reset_bus() is the correct answer. Did you get a chance to test this theory? > > Jian-Hong Pan
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年9月2日 週一 下午11:44寫道: > > On Mon, 12 Aug 2024, Jian-Hong Pan wrote: > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 下午5:49寫道: > > > On Wed, 7 Aug 2024, David E. Box wrote: > > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > > > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道: > > > > > > > > > > > > > > Hi Jian-Hong, > > > > > > > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > > > __pci_enable_link_state(), > > > > > > > > > it configs the link directly without ensuring related L1.2 parameters, > > > > > > > > > such > > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have > > > > > > > > > been > > > > > > > > > programmed. > > > > > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device > > > > > > > > > gets > > > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > > > > > > > > PCIe > > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > > > ... > > > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > > > > > > > L1_PM_Substates+ > > > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > > > > > > > L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > > > > > > > > SN550 > > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > > > ... > > > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > L1_PM_Substates+ > > > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > > > > > > > L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the > > > > > > > > > PCIe > > > > > > > > > Root Port and the child NVMe, they should be programmed with the same > > > > > > > > > LTR1.2_Threshold value. However, they have different values in this > > > > > > > > > case. > > > > > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > > > > > > > > before > > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > > > > > --- > > > > > > > > > v2: > > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > > > > > > > > > > > > > v3: > > > > > > > > > - Only enable supported features for the L1 Substates part > > > > > > > > > > > > > > > > > > v4: > > > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > > > > > > > > L1SS > > > > > > > > > > > > > > > > > > v5: > > > > > > > > > - Fix typo and commit message > > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > > > > > > > aspm_get_l1ss_cap()" > > > > > > > > > > > > > > > > > > v6: > > > > > > > > > - Skipped > > > > > > > > > > > > > > > > > > v7: > > > > > > > > > - Pick back and rebase on the new version kernel > > > > > > > > > - Drop the link state flag check. And, always config link state's > > > > > > > > > timing > > > > > > > > > parameters > > > > > > > > > > > > > > > > > > v8: > > > > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > > > > > > > getting the link's parent and child devices after check the link is > > > > > > > > > not NULL. This avoids NULL memory access. > > > > > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > > > > > > > index 5db1044c9895..55ff1d26fcea 100644 > > > > > > > > > --- a/drivers/pci/pcie/aspm.c > > > > > > > > > +++ b/drivers/pci/pcie/aspm.c > > > > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, > > > > > > > > > bool > > > > > > > > > locked) > > > > > > > > > { > > > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > > > > > > > if (!link) > > > > > > > > > return -EINVAL; > > > > > > > > > + > > > > > > > > > + parent = link->pdev; > > > > > > > > > + child = link->downstream; > > > > > > > > > + > > > > > > > > > /* > > > > > > > > > * A driver requested that ASPM be enabled on this device, but > > > > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct > > > > > > > > > pci_dev > > > > > > > > > *pdev, int state, bool locked) > > > > > > > > > if (!locked) > > > > > > > > > down_read(&pci_bus_sem); > > > > > > > > > mutex_lock(&aspm_lock); > > > > > > > > > + /* > > > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > > > > > > > > T_POWER_ON and > > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > > > > > > > > bits for > > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > > > + */ > > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > > > > > > > > > > > > > I still don't think this is the place to recalculate the L1.2 parameters > > > > > > > especially when know the calculation was done but was cleared by > > > > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after > > > > > > > pci_bus_reset() in vmd.c? > > > > > > > > > > > > I have not thought pci_save/restore_state() around pci_bus_reset() > > > > > > before. It is an interesting direction. > > > > > > > > > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4] > > > > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates", > > > > > > too. Then, both the PCIe bridge and the PCIe device have the same > > > > > > LTR_L1.2_THRESHOLD 101376ns as expected. > > > > > > > > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > > > > > index bbf4a47e7b31..6b8dd4f30127 100644 > > > > > > --- a/drivers/pci/controller/vmd.c > > > > > > +++ b/drivers/pci/controller/vmd.c > > > > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct > > > > > > pci_host_bridge *root_bridge, > > > > > > vmd_bridge->native_dpc = root_bridge->native_dpc; > > > > > > } > > > > > > > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata) > > > > > > +{ > > > > > > + pci_save_state(pdev); > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata) > > > > > > +{ > > > > > > + pci_restore_state(pdev); > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > /* > > > > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS. > > > > > > */ > > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > > > unsigned long features) > > > > > > pci_scan_child_bus(vmd->bus); > > > > > > vmd_domain_reset(vmd); > > > > > > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > > > > /* When Intel VMD is enabled, the OS does not discover the Root > > > > > > Ports > > > > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus() > > > > > > applies > > > > > > * a reset to the parent of the PCI device supplied as argument. > > > > > > This > > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > > > unsigned long features) > > > > > > break; > > > > > > } > > > > > > } > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > > > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > > > > vmd_enable_domain() which preserves state unlike pci_reset_bus()? > > > > > > > > > > (Don't tell me naming of these functions is a horrible mess. :-/) > > > > > > > > Hmm. So this *is* calling pci_reset_bus(). > > > > > > Yeah, I managed to get confused by the names myself, I somehow > > > ended up thinking it calls pci_bus_reset() which is not correct... > > > > > > > L1.2 configuration has specific > > > > ordering requirements for changes to parent & child devices. Could be why it's > > > > not getting restored properly. > > > > > > Indeed, it has to be something else since the patch above doesn't even > > > restore anything because dev->state_saved should get set to false by the > > > first pci_restore_state() called from > > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I > > > think!? > > > > Inspired by Ilpo's comment. I add some debug messages based on > > linux-next's tag 'next-20240809' to understand the code path of > > pci_reset_bus(): > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index ffaaca0978cb..3ee71374f1de 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > > * races with ->remove() by the device lock, which must be held by > > * the caller. > > */ > > - if (err_handler && err_handler->reset_prepare) > > + if (err_handler && err_handler->reset_prepare) { > > + pci_info(dev, "%s: %pF\n", __func__, > > err_handler->reset_prepare); > > err_handler->reset_prepare(dev); > > + } > > > > /* > > * Wake-up device prior to save. PM registers default to D0 after > > @@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > > pci_set_power_state(dev, PCI_D0); > > > > pci_save_state(dev); > > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__, > > dev->state_saved ? "true" : "false"); > > /* > > * Disable the device by clearing the Command register, except for > > * INTx-disable which is set. This not only disables MMIO and I/O port > > @@ -5655,6 +5658,10 @@ static void > > pci_bus_save_and_disable_locked(struct pci_bus *bus) > > struct pci_dev *dev; > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", > > + __func__, > > + dev->state_saved ? "true" : "false", > > + dev->subordinate ? "has" : "does not have"); > > pci_dev_save_and_disable(dev); > > if (dev->subordinate) > > pci_bus_save_and_disable_locked(dev->subordinate); > > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus) > > struct pci_dev *dev; > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", > > + __func__, > > + dev->state_saved ? "true" : "false", > > + dev->subordinate ? "has" : "does not have"); > > pci_dev_restore(dev); > > if (dev->subordinate) > > pci_bus_restore_locked(dev->subordinate); > > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe) > > if (!bus->self || !pci_bus_resettable(bus)) > > return -ENOTTY; > > > > - if (probe) > > + if (probe) { > > + pci_info(bus->self, "%s: probe is true. So return 0 > > directly", __func__); > > return 0; > > + } > > > > pci_bus_lock(bus); > > > > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus) > > int rc; > > > > rc = pci_bus_reset(bus, PCI_RESET_PROBE); > > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc); > > if (rc) > > return rc; > > > > if (pci_bus_trylock(bus)) { > > + pci_info(bus->self, "%s: pci_bus_trylock() returns > > true\n", __func__); > > pci_bus_save_and_disable_locked(bus); > > might_sleep(); > > rc = pci_bridge_secondary_bus_reset(bus->self); > > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus) > > */ > > int pci_reset_bus(struct pci_dev *pdev) > > { > > + pci_info(pdev, "%s: %s", __func__, > > !pci_probe_reset_slot(pdev->slot) ? "true" : "false"); > > return (!pci_probe_reset_slot(pdev->slot)) ? > > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); > > } > > > > And, have the information of VMD PCIe devices with the built kernel: > > > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > > decode]) > > ... > > Capabilities: [200 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=0ns > > L1SubCtl2: T_PwrOn=0us > > > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > > ... > > Capabilities: [900 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=101376ns > > L1SubCtl2: T_PwrOn=50us > > > > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the > > PCIe bridge has LTR1.2_Threshold=0ns. > > This is now the other way around as in the original posting that had > 0ns for 10000:e1:00.0 ?? > > Is this behavior even consistent or did you e.g. mess up some copy > pasting somewhere? > > > Then, check the dmesg. I notice the debug messages: > > > > pci 10000:e0:06.0: PCI bridge to [bus e1] > > pci 10000:e0:06.0: Primary bus is hard wired to 0 > > pci 10000:e1:00.0: pci_reset_bus: false > > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly > > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0 > > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true > > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is > > false, and does not have subordinate > > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true > > Freeing initrd memory: 75236K > > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true, > > and does not have subordinate > > > > So, the code path is: > > > > vmd_enable_domain() > > pci_reset_bus() > > __pci_reset_bus() > > pci_bus_reset() > > pci_bus_save_and_disable_locked() > > pci_dev_save_and_disable() > > pci_bus_restore_locked() > > pci_dev_restore() > > > > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does > > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not. > > So, PCIe bridge 10000:e0:06.0 does not restore state correctly. > > It should not be necessary to restore the bridge's configuration as it > ought to not be changed by the SBR itself, PCIe6 spec 7.5.1.3.13: > > "Port configuration registers must not be changed, except as required to > update Port status." > > While the second part wording leaves some leeway, I don't think any of > these field really fall under "Port status". > > > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's > > bus in vmd_enable_domain(). > > * Bus "e1" has only NVMe 10000:e1:00.0 > > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe > > 10000:e1:00.0 and SATA Controller 10000:e0:17.0. > > ...But even if those registers on the PCIe bridge were changed underneath > against the spec, it's not clear from your debug log why pci_dev_restore() > -> pci_restore_state() -> pci_restore_pcie_state() -> > pci_restore_aspm_l1ss_state() did not restore also parent's > LTR1.2_Threshold?? I think it should attempt to do that. > > -- > i. > > > Here is the PCI tree: > > > > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04 > > | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD Graphics G4] > > | +-04.0 Intel Corporation TigerLake-LP Dynamic Tuning > > Processor Participant > > | +-06.0 Intel Corporation RST VMD Managed Controller > > | +-07.0-[01-2b]-- > > | +-08.0 Intel Corporation GNA Scoring Accelerator module > > | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator Driver > > | +-0d.0 Intel Corporation Tiger Lake-LP Thunderbolt 4 USB > > Controller > > | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0 > > | +-0e.0 Intel Corporation Volume Management Device NVMe > > RAID Controller > > | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1 > > xHCI Host Controller > > | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM > > | +-14.3 Intel Corporation Wi-Fi 6 AX201 > > | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #0 > > | +-15.1 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #1 > > | +-16.0 Intel Corporation Tiger Lake-LP Management Engine Interface > > | +-17.0 Intel Corporation RST VMD Managed Controller > > | +-1f.0 Intel Corporation Tiger Lake-LP LPC Controller > > | +-1f.3 Intel Corporation Tiger Lake-LP Smart Sound > > Technology Audio Controller > > | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller > > | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller > > | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V > > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD > > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller > > > > According the findings above, to ensure the devices on the VMD bus > > have correctly states, seems pci_save_state() all the devices before > > pci_reset_bus(), and pci_restore_state() all the devices after > > pci_reset_bus() is the correct answer. I add some debug messages based on v6.11 as following: diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index cee2365e54b8..404ce92f0152 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -109,21 +109,28 @@ void pci_restore_aspm_l1ss_state(struct pci_dev *pdev) u32 cl_ctl1, cl_ctl2, cl_l1_2_enable; u16 clnkctl, plnkctl; + pci_info(pdev, "%s", __func__); /* * In case BIOS enabled L1.2 when resuming, we need to disable it first * on the downstream component before the upstream. So, don't attempt to * restore either until we are at the downstream component. */ - if (pcie_downstream_port(pdev) || !parent) + if (pcie_downstream_port(pdev) || !parent) { + pci_info(pdev, "%s: %s", __func__, pcie_downstream_port(pdev) ? "is downstream port" : "not parent"); return; + } - if (!pdev->l1ss || !parent->l1ss) + if (!pdev->l1ss || !parent->l1ss) { + pci_info(pdev, "%s: %sdoes not have l1ss", __func__, !pdev->l1ss ? "" : "parent "); return; + } cl_save_state = pci_find_saved_ext_cap(pdev, PCI_EXT_CAP_ID_L1SS); pl_save_state = pci_find_saved_ext_cap(parent, PCI_EXT_CAP_ID_L1SS); - if (!cl_save_state || !pl_save_state) + if (!cl_save_state || !pl_save_state) { + pci_info(pdev, "%s: %sdid not save ext_cap", __func__, !cl_save_state ? "" : "parent "); return; + } cap = &cl_save_state->cap.data[0]; cl_ctl2 = *cap++; @@ -131,6 +138,7 @@ void pci_restore_aspm_l1ss_state(struct pci_dev *pdev) cap = &pl_save_state->cap.data[0]; pl_ctl2 = *cap++; pl_ctl1 = *cap; + pci_info(pdev, "%s: cl_ctl1: 0x%08x, cl_ctl2: 0x%08x, pl_ctl1: 0x%08x, pl_ctl2: 0x%08x", __func__, cl_ctl1, cl_ctl2, pl_ctl1, pl_ctl2); /* Make sure L0s/L1 are disabled before updating L1SS config */ pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &clnkctl); Here is the corresponding log: [ 0.418931] pci 10000:e0:06.0: PCI bridge to [bus e1] [ 0.418936] pci 10000:e0:06.0: Primary bus is hard wired to 0 [ 0.447474] Freeing initrd memory: 24700K [ 0.670789] pci 10000:e1:00.0: pci_restore_aspm_l1ss_state [ 0.670807] pci 10000:e1:00.0: pci_restore_aspm_l1ss_state: cl_ctl1: 0x40630000, cl_ctl2: 0x00000029, pl_ctl1: 0x00000000, pl_ctl2: 0x00000000 [ 0.670862] pci 10000:e0:06.0: bridge window [mem 0x76000000-0x760fffff]: assigned [ 0.670864] pci 10000:e0:17.0: BAR 0 [mem 0x76100000-0x76101fff]: assigned [ 0.670881] pci 10000:e0:06.0: bridge window [io size 0x1000]: can't assign; no space [ 0.670882] pci 10000:e0:06.0: bridge window [io size 0x1000]: failed to assign [ 0.670884] pci 10000:e0:17.0: BAR 5 [mem 0x76102000-0x761027ff]: assigned [ 0.670893] pci 10000:e0:17.0: BAR 1 [mem 0x76102800-0x761028ff]: assigned [ 0.670896] pci 10000:e0:17.0: BAR 4 [io size 0x0020]: can't assign; no space [ 0.670897] pci 10000:e0:17.0: BAR 4 [io size 0x0020]: failed to assign [ 0.670898] pci 10000:e0:17.0: BAR 2 [io size 0x0008]: can't assign; no space [ 0.670898] pci 10000:e0:17.0: BAR 2 [io size 0x0008]: failed to assign [ 0.670899] pci 10000:e0:17.0: BAR 3 [io size 0x0004]: can't assign; no space [ 0.670900] pci 10000:e0:17.0: BAR 3 [io size 0x0004]: failed to assign [ 0.670901] pci 10000:e1:00.0: BAR 0 [mem 0x76000000-0x76003fff 64bit]: assigned [ 0.670909] pci 10000:e1:00.0: BAR 4 [mem 0x76004000-0x760040ff 64bit]: assigned [ 0.670918] pci 10000:e0:06.0: PCI bridge to [bus e1] [ 0.670921] pci 10000:e0:06.0: bridge window [mem 0x76000000-0x760fffff] [ 0.670950] pci 10000:e1:00.0: VMD: Default LTR value set by driver [ 0.670977] pcieport 10000:e0:06.0: can't derive routing for PCI INT D [ 0.670979] pcieport 10000:e0:06.0: PCI INT D: no GSI [ 0.671043] pcieport 10000:e0:06.0: PME: Signaling with IRQ 143 [ 0.671092] vmd 0000:00:0e.0: Bound to PCI domain 10000 We can notice both pl_ctl1 and pl_ctl2 are 0x0. Because, the link's parent device (PCIe bridge 10000:e0:06.0) did not save state before reset. So, pci_restore_aspm_l1ss_state() restores parent's LTR1.2_Threshold with a wrong value 0. Jian-Hong Pan
Nirmal Patel <nirmal.patel@linux.intel.com> 於 2024年9月21日 週六 上午12:03寫道: > > On Mon, 12 Aug 2024 16:18:22 +0800 > Jian-Hong Pan <jhp@endlessos.org> wrote: > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 > > 下午5:49寫道: > > > > > > On Wed, 7 Aug 2024, David E. Box wrote: > > > > > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > > > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 > > > > > > 週二 上午4:26寫道: > > > > > > > > > > > > > > Hi Jian-Hong, > > > > > > > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 > > > > > > > > 下午4:04寫道: > > > > > > > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > > > __pci_enable_link_state(), > > > > > > > > > it configs the link directly without ensuring related > > > > > > > > > L1.2 parameters, such > > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and > > > > > > > > > LTR_L1.2_THRESHOLD have been > > > > > > > > > programmed. > > > > > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and > > > > > > > > > child device gets > > > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled > > > > > > > > > VMD: > > > > > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen > > > > > > > > > Core Processor PCIe > > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > > > ... > > > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ > > > > > > > > > ASPM_L1.1+ L1_PM_Substates+ > > > > > > > > > PortCommonModeRestoreTime=45us > > > > > > > > > PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- > > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us > > > > > > > > > LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk > > > > > > > > > Corp WD Blue SN550 > > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > > > ... > > > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ > > > > > > > > > ASPM_L1.1- L1_PM_Substates+ > > > > > > > > > PortCommonModeRestoreTime=32us > > > > > > > > > PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- > > > > > > > > > PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us > > > > > > > > > LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling > > > > > > > > > ASPM L1.2 on the PCIe > > > > > > > > > Root Port and the child NVMe, they should be programmed > > > > > > > > > with the same LTR1.2_Threshold value. However, they > > > > > > > > > have different values in this case. > > > > > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 > > > > > > > > > parameters properly before > > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > > > > > --- > > > > > > > > > v2: > > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 > > > > > > > > > Substates > > > > > > > > > > > > > > > > > > v3: > > > > > > > > > - Only enable supported features for the L1 Substates > > > > > > > > > part > > > > > > > > > > > > > > > > > > v4: > > > > > > > > > - Focus on fixing L1.2 parameters, instead of > > > > > > > > > re-initializing whole L1SS > > > > > > > > > > > > > > > > > > v5: > > > > > > > > > - Fix typo and commit message > > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: > > > > > > > > > Introduce aspm_get_l1ss_cap()" > > > > > > > > > > > > > > > > > > v6: > > > > > > > > > - Skipped > > > > > > > > > > > > > > > > > > v7: > > > > > > > > > - Pick back and rebase on the new version kernel > > > > > > > > > - Drop the link state flag check. And, always config > > > > > > > > > link state's timing > > > > > > > > > parameters > > > > > > > > > > > > > > > > > > v8: > > > > > > > > > - Because pcie_aspm_get_link() might return the link as > > > > > > > > > NULL, move getting the link's parent and child devices > > > > > > > > > after check the link is not NULL. This avoids NULL > > > > > > > > > memory access. > > > > > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c > > > > > > > > > b/drivers/pci/pcie/aspm.c index > > > > > > > > > 5db1044c9895..55ff1d26fcea 100644 --- > > > > > > > > > a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c > > > > > > > > > @@ -1411,9 +1411,15 @@ > > > > > > > > > EXPORT_SYMBOL(pci_disable_link_state); static int > > > > > > > > > __pci_enable_link_state(struct pci_dev *pdev, int > > > > > > > > > state, bool locked) > > > > > > > > > { > > > > > > > > > struct pcie_link_state *link = > > > > > > > > > pcie_aspm_get_link(pdev); > > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > > > > > > > if (!link) > > > > > > > > > return -EINVAL; > > > > > > > > > + > > > > > > > > > + parent = link->pdev; > > > > > > > > > + child = link->downstream; > > > > > > > > > + > > > > > > > > > /* > > > > > > > > > * A driver requested that ASPM be enabled on > > > > > > > > > this device, but > > > > > > > > > * if we don't have permission to manage ASPM > > > > > > > > > (e.g., on ACPI @@ -1428,6 +1434,15 @@ static int > > > > > > > > > __pci_enable_link_state(struct pci_dev > > > > > > > > > *pdev, int state, bool locked) > > > > > > > > > if (!locked) > > > > > > > > > down_read(&pci_bus_sem); > > > > > > > > > mutex_lock(&aspm_lock); > > > > > > > > > + /* > > > > > > > > > + * Ensure L1.2 parameters: > > > > > > > > > Common_Mode_Restore_Times, T_POWER_ON and > > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly > > > > > > > > > before enable bits for > > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > > > + */ > > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, > > > > > > > > > child_l1ss_cap); > > > > > > > > > > > > > > I still don't think this is the place to recalculate the > > > > > > > L1.2 parameters especially when know the calculation was > > > > > > > done but was cleared by pci_bus_reset(). Can't we just do a > > > > > > > pci_save/restore_state() before/after pci_bus_reset() in > > > > > > > vmd.c? > > > > > > > > > > > > I have not thought pci_save/restore_state() around > > > > > > pci_bus_reset() before. It is an interesting direction. > > > > > > > > > > > > So, I prepare modification below for test. Include "[PATCH > > > > > > v8 1/4] PCI: vmd: Set PCI devices to D0 before enable PCI > > > > > > PM's L1 substates", too. Then, both the PCIe bridge and the > > > > > > PCIe device have the same LTR_L1.2_THRESHOLD 101376ns as > > > > > > expected. > > > > > > > > > > > > diff --git a/drivers/pci/controller/vmd.c > > > > > > b/drivers/pci/controller/vmd.c index > > > > > > bbf4a47e7b31..6b8dd4f30127 100644 --- > > > > > > a/drivers/pci/controller/vmd.c +++ > > > > > > b/drivers/pci/controller/vmd.c @@ -727,6 +727,18 @@ static > > > > > > void vmd_copy_host_bridge_flags(struct pci_host_bridge > > > > > > *root_bridge, vmd_bridge->native_dpc = > > > > > > root_bridge->native_dpc; } > > > > > > > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void > > > > > > *userdata) +{ > > > > > > + pci_save_state(pdev); > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void > > > > > > *userdata) +{ > > > > > > + pci_restore_state(pdev); > > > > > > + return 0; > > > > > > +} > > > > > > + > > > > > > /* > > > > > > * Enable ASPM and LTR settings on devices that aren't > > > > > > configured by BIOS. */ > > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct > > > > > > vmd_dev *vmd, unsigned long features) > > > > > > pci_scan_child_bus(vmd->bus); > > > > > > vmd_domain_reset(vmd); > > > > > > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > > > > /* When Intel VMD is enabled, the OS does not > > > > > > discover the Root Ports > > > > > > * owned by Intel VMD within the MMCFG space. > > > > > > pci_reset_bus() applies > > > > > > * a reset to the parent of the PCI device supplied > > > > > > as argument. This > > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct > > > > > > vmd_dev *vmd, unsigned long features) > > > > > > break; > > > > > > } > > > > > > } > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > > > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > > > > vmd_enable_domain() which preserves state unlike > > > > > pci_reset_bus()? > > > > > > > > > > (Don't tell me naming of these functions is a horrible mess. > > > > > :-/) > > > > > > > > Hmm. So this *is* calling pci_reset_bus(). > > > > > > Yeah, I managed to get confused by the names myself, I somehow > > > ended up thinking it calls pci_bus_reset() which is not correct... > > > > > > > L1.2 configuration has specific > > > > ordering requirements for changes to parent & child devices. > > > > Could be why it's not getting restored properly. > > > > > > Indeed, it has to be something else since the patch above doesn't > > > even restore anything because dev->state_saved should get set to > > > false by the first pci_restore_state() called from > > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), > > > I think!? > > > > Inspired by Ilpo's comment. I add some debug messages based on > > linux-next's tag 'next-20240809' to understand the code path of > > pci_reset_bus(): > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index ffaaca0978cb..3ee71374f1de 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct > > pci_dev *dev) > > * races with ->remove() by the device lock, which must be > > held by > > * the caller. > > */ > > - if (err_handler && err_handler->reset_prepare) > > + if (err_handler && err_handler->reset_prepare) { > > + pci_info(dev, "%s: %pF\n", __func__, > > err_handler->reset_prepare); > > err_handler->reset_prepare(dev); > > + } > > > > /* > > * Wake-up device prior to save. PM registers default to D0 > > after @@ -5144,6 +5146,7 @@ static void > > pci_dev_save_and_disable(struct pci_dev *dev) > > pci_set_power_state(dev, PCI_D0); > > > > pci_save_state(dev); > > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__, > > dev->state_saved ? "true" : "false"); > > /* > > * Disable the device by clearing the Command register, > > except for > > * INTx-disable which is set. This not only disables MMIO > > and I/O port @@ -5655,6 +5658,10 @@ static void > > pci_bus_save_and_disable_locked(struct pci_bus *bus) > > struct pci_dev *dev; > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > + pci_info(dev, "%s: PCI state_saved is %s, and %s > > subordinate\n", > > + __func__, > > + dev->state_saved ? "true" : "false", > > + dev->subordinate ? "has" : "does not have"); > > pci_dev_save_and_disable(dev); > > if (dev->subordinate) > > pci_bus_save_and_disable_locked(dev->subordinate); > > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct > > pci_bus *bus) struct pci_dev *dev; > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > + pci_info(dev, "%s: PCI state_saved is %s, and %s > > subordinate\n", > > + __func__, > > + dev->state_saved ? "true" : "false", > > + dev->subordinate ? "has" : "does not have"); > > pci_dev_restore(dev); > > if (dev->subordinate) > > pci_bus_restore_locked(dev->subordinate); > > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, > > bool probe) if (!bus->self || !pci_bus_resettable(bus)) > > return -ENOTTY; > > > > - if (probe) > > + if (probe) { > > + pci_info(bus->self, "%s: probe is true. So return 0 > > directly", __func__); > > return 0; > > + } > > > > pci_bus_lock(bus); > > > > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus > > *bus) int rc; > > > > rc = pci_bus_reset(bus, PCI_RESET_PROBE); > > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", > > __func__, rc); if (rc) > > return rc; > > > > if (pci_bus_trylock(bus)) { > > + pci_info(bus->self, "%s: pci_bus_trylock() returns > > true\n", __func__); > > pci_bus_save_and_disable_locked(bus); > > might_sleep(); > > rc = pci_bridge_secondary_bus_reset(bus->self); > > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus) > > */ > > int pci_reset_bus(struct pci_dev *pdev) > > { > > + pci_info(pdev, "%s: %s", __func__, > > !pci_probe_reset_slot(pdev->slot) ? "true" : "false"); > > return (!pci_probe_reset_slot(pdev->slot)) ? > > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); > > } > > > > And, have the information of VMD PCIe devices with the built kernel: > > > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > > decode]) > > ... > > Capabilities: [200 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=0ns > > L1SubCtl2: T_PwrOn=0us > > > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > > ... > > Capabilities: [900 v1] L1 PM Substates > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > T_CommonMode=0us LTR1.2_Threshold=101376ns > > L1SubCtl2: T_PwrOn=50us > > > > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the > > PCIe bridge has LTR1.2_Threshold=0ns. > > > > Then, check the dmesg. I notice the debug messages: > > > > pci 10000:e0:06.0: PCI bridge to [bus e1] > > pci 10000:e0:06.0: Primary bus is hard wired to 0 > > pci 10000:e1:00.0: pci_reset_bus: false > > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly > > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0 > > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true > > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is > > false, and does not have subordinate > > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true > > Freeing initrd memory: 75236K > > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true, > > and does not have subordinate > > > > So, the code path is: > > > > vmd_enable_domain() > > pci_reset_bus() > > __pci_reset_bus() > > pci_bus_reset() > > pci_bus_save_and_disable_locked() > > pci_dev_save_and_disable() > > pci_bus_restore_locked() > > pci_dev_restore() > > > > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does > > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not. > > So, PCIe bridge 10000:e0:06.0 does not restore state correctly. > > > > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's > > bus in vmd_enable_domain(). > > * Bus "e1" has only NVMe 10000:e1:00.0 > > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe > > 10000:e1:00.0 and SATA Controller 10000:e0:17.0. > > > > Here is the PCI tree: > > > > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04 > > | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD > > Graphics G4] | +-04.0 Intel Corporation TigerLake-LP > > Dynamic Tuning Processor Participant > > | +-06.0 Intel Corporation RST VMD Managed Controller > > | +-07.0-[01-2b]-- > > | +-08.0 Intel Corporation GNA Scoring Accelerator module > > | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator > > Driver | +-0d.0 Intel Corporation Tiger Lake-LP > > Thunderbolt 4 USB Controller > > | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 > > NHI #0 | +-0e.0 Intel Corporation Volume Management Device > > NVMe RAID Controller > > | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1 > > xHCI Host Controller > > | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM > > | +-14.3 Intel Corporation Wi-Fi 6 AX201 > > | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C > > Controller #0 | +-15.1 Intel Corporation Tiger Lake-LP > > Serial IO I2C Controller #1 | +-16.0 Intel Corporation > > Tiger Lake-LP Management Engine Interface | +-17.0 Intel > > Corporation RST VMD Managed Controller | +-1f.0 Intel > > Corporation Tiger Lake-LP LPC Controller | +-1f.3 Intel > > Corporation Tiger Lake-LP Smart Sound Technology Audio Controller > > | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller > > | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller > > | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V > > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD > > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller > > > > According the findings above, to ensure the devices on the VMD bus > > have correctly states, seems pci_save_state() all the devices before > > pci_reset_bus(), and pci_restore_state() all the devices after > > pci_reset_bus() is the correct answer. > > Did you get a chance to test this theory? Yes. But, according the findings in the previous mail talking about restoring parent's LTR1.2_Threshold with a wrong value 0, it can be shrunk to save/restore the PCIe bridge's state. Preparing a new patch set. Jian-Hong Pan
On Mon, 23 Sep 2024, Jian-Hong Pan wrote: > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年9月2日 週一 下午11:44寫道: > > > > On Mon, 12 Aug 2024, Jian-Hong Pan wrote: > > > > > Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> 於 2024年8月8日 週四 下午5:49寫道: > > > > On Wed, 7 Aug 2024, David E. Box wrote: > > > > > On Wed, 2024-08-07 at 14:18 +0300, Ilpo Järvinen wrote: > > > > > > On Wed, 7 Aug 2024, Jian-Hong Pan wrote: > > > > > > > > > > > > > David E. Box <david.e.box@linux.intel.com> 於 2024年8月6日 週二 上午4:26寫道: > > > > > > > > > > > > > > > > Hi Jian-Hong, > > > > > > > > > > > > > > > > On Fri, 2024-08-02 at 16:24 +0800, Jian-Hong Pan wrote: > > > > > > > > > Jian-Hong Pan <jhp@endlessos.org> 於 2024年7月19日 週五 下午4:04寫道: > > > > > > > > > > > > > > > > > > > > Currently, when enable link's L1.2 features with > > > > > > > > > > __pci_enable_link_state(), > > > > > > > > > > it configs the link directly without ensuring related L1.2 parameters, > > > > > > > > > > such > > > > > > > > > > as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have > > > > > > > > > > been > > > > > > > > > > programmed. > > > > > > > > > > > > > > > > > > > > This leads the link's L1.2 between PCIe Root Port and child device > > > > > > > > > > gets > > > > > > > > > > wrong configs when a caller tries to enabled it. > > > > > > > > > > > > > > > > > > > > Here is a failed example on ASUS B1400CEAE with enabled VMD: > > > > > > > > > > > > > > > > > > > > 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor > > > > > > > > > > PCIe > > > > > > > > > > Controller (rev 01) (prog-if 00 [Normal decode]) > > > > > > > > > > ... > > > > > > > > > > Capabilities: [200 v1] L1 PM Substates > > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > > > > > > > > > > L1_PM_Substates+ > > > > > > > > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > > T_CommonMode=45us LTR1.2_Threshold=101376ns > > > > > > > > > > L1SubCtl2: T_PwrOn=50us > > > > > > > > > > > > > > > > > > > > 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue > > > > > > > > > > SN550 > > > > > > > > > > NVMe SSD (rev 01) (prog-if 02 [NVM Express]) > > > > > > > > > > ... > > > > > > > > > > Capabilities: [900 v1] L1 PM Substates > > > > > > > > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > > L1_PM_Substates+ > > > > > > > > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > > > > > > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > > > > > > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > > > > > > > > L1SubCtl2: T_PwrOn=10us > > > > > > > > > > > > > > > > > > > > According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the > > > > > > > > > > PCIe > > > > > > > > > > Root Port and the child NVMe, they should be programmed with the same > > > > > > > > > > LTR1.2_Threshold value. However, they have different values in this > > > > > > > > > > case. > > > > > > > > > > > > > > > > > > > > Invoke aspm_calc_l12_info() to program the L1.2 parameters properly > > > > > > > > > > before > > > > > > > > > > enable L1.2 bits of L1 PM Substates Control Register in > > > > > > > > > > __pci_enable_link_state(). > > > > > > > > > > > > > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 > > > > > > > > > > Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> > > > > > > > > > > --- > > > > > > > > > > v2: > > > > > > > > > > - Prepare the PCIe LTR parameters before enable L1 Substates > > > > > > > > > > > > > > > > > > > > v3: > > > > > > > > > > - Only enable supported features for the L1 Substates part > > > > > > > > > > > > > > > > > > > > v4: > > > > > > > > > > - Focus on fixing L1.2 parameters, instead of re-initializing whole > > > > > > > > > > L1SS > > > > > > > > > > > > > > > > > > > > v5: > > > > > > > > > > - Fix typo and commit message > > > > > > > > > > - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce > > > > > > > > > > aspm_get_l1ss_cap()" > > > > > > > > > > > > > > > > > > > > v6: > > > > > > > > > > - Skipped > > > > > > > > > > > > > > > > > > > > v7: > > > > > > > > > > - Pick back and rebase on the new version kernel > > > > > > > > > > - Drop the link state flag check. And, always config link state's > > > > > > > > > > timing > > > > > > > > > > parameters > > > > > > > > > > > > > > > > > > > > v8: > > > > > > > > > > - Because pcie_aspm_get_link() might return the link as NULL, move > > > > > > > > > > getting the link's parent and child devices after check the link is > > > > > > > > > > not NULL. This avoids NULL memory access. > > > > > > > > > > > > > > > > > > > > drivers/pci/pcie/aspm.c | 15 +++++++++++++++ > > > > > > > > > > 1 file changed, 15 insertions(+) > > > > > > > > > > > > > > > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > > > > > > > > index 5db1044c9895..55ff1d26fcea 100644 > > > > > > > > > > --- a/drivers/pci/pcie/aspm.c > > > > > > > > > > +++ b/drivers/pci/pcie/aspm.c > > > > > > > > > > @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); > > > > > > > > > > static int __pci_enable_link_state(struct pci_dev *pdev, int state, > > > > > > > > > > bool > > > > > > > > > > locked) > > > > > > > > > > { > > > > > > > > > > struct pcie_link_state *link = pcie_aspm_get_link(pdev); > > > > > > > > > > + u32 parent_l1ss_cap, child_l1ss_cap; > > > > > > > > > > + struct pci_dev *parent, *child; > > > > > > > > > > > > > > > > > > > > if (!link) > > > > > > > > > > return -EINVAL; > > > > > > > > > > + > > > > > > > > > > + parent = link->pdev; > > > > > > > > > > + child = link->downstream; > > > > > > > > > > + > > > > > > > > > > /* > > > > > > > > > > * A driver requested that ASPM be enabled on this device, but > > > > > > > > > > * if we don't have permission to manage ASPM (e.g., on ACPI > > > > > > > > > > @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct > > > > > > > > > > pci_dev > > > > > > > > > > *pdev, int state, bool locked) > > > > > > > > > > if (!locked) > > > > > > > > > > down_read(&pci_bus_sem); > > > > > > > > > > mutex_lock(&aspm_lock); > > > > > > > > > > + /* > > > > > > > > > > + * Ensure L1.2 parameters: Common_Mode_Restore_Times, > > > > > > > > > > T_POWER_ON and > > > > > > > > > > + * LTR_L1.2_THRESHOLD are programmed properly before enable > > > > > > > > > > bits for > > > > > > > > > > + * L1.2, per PCIe r6.0, sec 5.5.4. > > > > > > > > > > + */ > > > > > > > > > > + parent_l1ss_cap = aspm_get_l1ss_cap(parent); > > > > > > > > > > + child_l1ss_cap = aspm_get_l1ss_cap(child); > > > > > > > > > > + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); > > > > > > > > > > > > > > > > I still don't think this is the place to recalculate the L1.2 parameters > > > > > > > > especially when know the calculation was done but was cleared by > > > > > > > > pci_bus_reset(). Can't we just do a pci_save/restore_state() before/after > > > > > > > > pci_bus_reset() in vmd.c? > > > > > > > > > > > > > > I have not thought pci_save/restore_state() around pci_bus_reset() > > > > > > > before. It is an interesting direction. > > > > > > > > > > > > > > So, I prepare modification below for test. Include "[PATCH v8 1/4] > > > > > > > PCI: vmd: Set PCI devices to D0 before enable PCI PM's L1 substates", > > > > > > > too. Then, both the PCIe bridge and the PCIe device have the same > > > > > > > LTR_L1.2_THRESHOLD 101376ns as expected. > > > > > > > > > > > > > > diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c > > > > > > > index bbf4a47e7b31..6b8dd4f30127 100644 > > > > > > > --- a/drivers/pci/controller/vmd.c > > > > > > > +++ b/drivers/pci/controller/vmd.c > > > > > > > @@ -727,6 +727,18 @@ static void vmd_copy_host_bridge_flags(struct > > > > > > > pci_host_bridge *root_bridge, > > > > > > > vmd_bridge->native_dpc = root_bridge->native_dpc; > > > > > > > } > > > > > > > > > > > > > > +static int vmd_pci_save_state(struct pci_dev *pdev, void *userdata) > > > > > > > +{ > > > > > > > + pci_save_state(pdev); > > > > > > > + return 0; > > > > > > > +} > > > > > > > + > > > > > > > +static int vmd_pci_restore_state(struct pci_dev *pdev, void *userdata) > > > > > > > +{ > > > > > > > + pci_restore_state(pdev); > > > > > > > + return 0; > > > > > > > +} > > > > > > > + > > > > > > > /* > > > > > > > * Enable ASPM and LTR settings on devices that aren't configured by BIOS. > > > > > > > */ > > > > > > > @@ -927,6 +939,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > > > > unsigned long features) > > > > > > > pci_scan_child_bus(vmd->bus); > > > > > > > vmd_domain_reset(vmd); > > > > > > > > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_save_state, NULL); > > > > > > > /* When Intel VMD is enabled, the OS does not discover the Root > > > > > > > Ports > > > > > > > * owned by Intel VMD within the MMCFG space. pci_reset_bus() > > > > > > > applies > > > > > > > * a reset to the parent of the PCI device supplied as argument. > > > > > > > This > > > > > > > @@ -945,6 +958,7 @@ static int vmd_enable_domain(struct vmd_dev *vmd, > > > > > > > unsigned long features) > > > > > > > break; > > > > > > > } > > > > > > > } > > > > > > > + pci_walk_bus(vmd->bus, vmd_pci_restore_state, NULL); > > > > > > > > > > > > Why not call pci_reset_bus() (or __pci_reset_bus()) then in > > > > > > vmd_enable_domain() which preserves state unlike pci_reset_bus()? > > > > > > > > > > > > (Don't tell me naming of these functions is a horrible mess. :-/) > > > > > > > > > > Hmm. So this *is* calling pci_reset_bus(). > > > > > > > > Yeah, I managed to get confused by the names myself, I somehow > > > > ended up thinking it calls pci_bus_reset() which is not correct... > > > > > > > > > L1.2 configuration has specific > > > > > ordering requirements for changes to parent & child devices. Could be why it's > > > > > not getting restored properly. > > > > > > > > Indeed, it has to be something else since the patch above doesn't even > > > > restore anything because dev->state_saved should get set to false by the > > > > first pci_restore_state() called from > > > > __pci_reset_bus() -> pci_bus_restore_locked() -> pci_dev_restore(), I > > > > think!? > > > > > > Inspired by Ilpo's comment. I add some debug messages based on > > > linux-next's tag 'next-20240809' to understand the code path of > > > pci_reset_bus(): > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index ffaaca0978cb..3ee71374f1de 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -5133,8 +5133,10 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > > > * races with ->remove() by the device lock, which must be held by > > > * the caller. > > > */ > > > - if (err_handler && err_handler->reset_prepare) > > > + if (err_handler && err_handler->reset_prepare) { > > > + pci_info(dev, "%s: %pF\n", __func__, > > > err_handler->reset_prepare); > > > err_handler->reset_prepare(dev); > > > + } > > > > > > /* > > > * Wake-up device prior to save. PM registers default to D0 after > > > @@ -5144,6 +5146,7 @@ static void pci_dev_save_and_disable(struct pci_dev *dev) > > > pci_set_power_state(dev, PCI_D0); > > > > > > pci_save_state(dev); > > > + pci_info(dev, "%s: PCI state_saved is %s\n", __func__, > > > dev->state_saved ? "true" : "false"); > > > /* > > > * Disable the device by clearing the Command register, except for > > > * INTx-disable which is set. This not only disables MMIO and I/O port > > > @@ -5655,6 +5658,10 @@ static void > > > pci_bus_save_and_disable_locked(struct pci_bus *bus) > > > struct pci_dev *dev; > > > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", > > > + __func__, > > > + dev->state_saved ? "true" : "false", > > > + dev->subordinate ? "has" : "does not have"); > > > pci_dev_save_and_disable(dev); > > > if (dev->subordinate) > > > pci_bus_save_and_disable_locked(dev->subordinate); > > > @@ -5671,6 +5678,10 @@ static void pci_bus_restore_locked(struct pci_bus *bus) > > > struct pci_dev *dev; > > > > > > list_for_each_entry(dev, &bus->devices, bus_list) { > > > + pci_info(dev, "%s: PCI state_saved is %s, and %s subordinate\n", > > > + __func__, > > > + dev->state_saved ? "true" : "false", > > > + dev->subordinate ? "has" : "does not have"); > > > pci_dev_restore(dev); > > > if (dev->subordinate) > > > pci_bus_restore_locked(dev->subordinate); > > > @@ -5786,8 +5797,10 @@ static int pci_bus_reset(struct pci_bus *bus, bool probe) > > > if (!bus->self || !pci_bus_resettable(bus)) > > > return -ENOTTY; > > > > > > - if (probe) > > > + if (probe) { > > > + pci_info(bus->self, "%s: probe is true. So return 0 > > > directly", __func__); > > > return 0; > > > + } > > > > > > pci_bus_lock(bus); > > > > > > @@ -5858,10 +5871,12 @@ static int __pci_reset_bus(struct pci_bus *bus) > > > int rc; > > > > > > rc = pci_bus_reset(bus, PCI_RESET_PROBE); > > > + pci_info(bus->self, "%s: pci_bus_reset() returns %d\n", __func__, rc); > > > if (rc) > > > return rc; > > > > > > if (pci_bus_trylock(bus)) { > > > + pci_info(bus->self, "%s: pci_bus_trylock() returns > > > true\n", __func__); > > > pci_bus_save_and_disable_locked(bus); > > > might_sleep(); > > > rc = pci_bridge_secondary_bus_reset(bus->self); > > > @@ -5881,6 +5896,7 @@ static int __pci_reset_bus(struct pci_bus *bus) > > > */ > > > int pci_reset_bus(struct pci_dev *pdev) > > > { > > > + pci_info(pdev, "%s: %s", __func__, > > > !pci_probe_reset_slot(pdev->slot) ? "true" : "false"); > > > return (!pci_probe_reset_slot(pdev->slot)) ? > > > __pci_reset_slot(pdev->slot) : __pci_reset_bus(pdev->bus); > > > } > > > > > > And, have the information of VMD PCIe devices with the built kernel: > > > > > > 10000:e0:06.0 PCI bridge [0604]: Intel Corporation 11th Gen Core > > > Processor PCIe Controller [8086:9a09] (rev 01) (prog-if 00 [Normal > > > decode]) > > > ... > > > Capabilities: [200 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > > > PortCommonModeRestoreTime=45us PortTPowerOnTime=50us > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > T_CommonMode=0us LTR1.2_Threshold=0ns > > > L1SubCtl2: T_PwrOn=0us > > > > > > 10000:e1:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD > > > Blue SN550 NVMe SSD [15b7:5009] (rev 01) (prog-if 02 [NVM Express]) > > > ... > > > Capabilities: [900 v1] L1 PM Substates > > > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ > > > PortCommonModeRestoreTime=32us PortTPowerOnTime=10us > > > L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- > > > T_CommonMode=0us LTR1.2_Threshold=101376ns > > > L1SubCtl2: T_PwrOn=50us > > > > > > We can see the NVMe has expected LTR1.2_Threshold=101376ns, but the > > > PCIe bridge has LTR1.2_Threshold=0ns. > > > > This is now the other way around as in the original posting that had > > 0ns for 10000:e1:00.0 ?? > > > > Is this behavior even consistent or did you e.g. mess up some copy > > pasting somewhere? > > > > > Then, check the dmesg. I notice the debug messages: > > > > > > pci 10000:e0:06.0: PCI bridge to [bus e1] > > > pci 10000:e0:06.0: Primary bus is hard wired to 0 > > > pci 10000:e1:00.0: pci_reset_bus: false > > > pci 10000:e0:06.0: pci_bus_reset: probe is true. So return 0 directly > > > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_reset() returns 0 > > > pci 10000:e0:06.0: __pci_reset_bus: pci_bus_trylock() returns true > > > pci 10000:e1:00.0: pci_bus_save_and_disable_locked: PCI state_saved is > > > false, and does not have subordinate > > > pci 10000:e1:00.0: pci_dev_save_and_disable: PCI state_saved is true > > > Freeing initrd memory: 75236K > > > pci 10000:e1:00.0: pci_bus_restore_locked: PCI state_saved is true, > > > and does not have subordinate > > > > > > So, the code path is: > > > > > > vmd_enable_domain() > > > pci_reset_bus() > > > __pci_reset_bus() > > > pci_bus_reset() > > > pci_bus_save_and_disable_locked() > > > pci_dev_save_and_disable() > > > pci_bus_restore_locked() > > > pci_dev_restore() > > > > > > And, from the debug messages, I learned only NVMe 10000:e1:00.0 does > > > pci_save/restore_state. But, the PCIe bridge 10000:e0:06.0 does not. > > > So, PCIe bridge 10000:e0:06.0 does not restore state correctly. > > > > It should not be necessary to restore the bridge's configuration as it > > ought to not be changed by the SBR itself, PCIe6 spec 7.5.1.3.13: > > > > "Port configuration registers must not be changed, except as required to > > update Port status." > > > > While the second part wording leaves some leeway, I don't think any of > > these field really fall under "Port status". > > > > > Besides, it is NVMe 10000:e1:00.0's bus [e1] been reset, not the VMD's > > > bus in vmd_enable_domain(). > > > * Bus "e1" has only NVMe 10000:e1:00.0 > > > * VMD's bus in vmd_enable_domain() has PCIe bridge 10000:e0:06.0, NVMe > > > 10000:e1:00.0 and SATA Controller 10000:e0:17.0. > > > > ...But even if those registers on the PCIe bridge were changed underneath > > against the spec, it's not clear from your debug log why pci_dev_restore() > > -> pci_restore_state() -> pci_restore_pcie_state() -> > > pci_restore_aspm_l1ss_state() did not restore also parent's > > LTR1.2_Threshold?? I think it should attempt to do that. > > > > -- > > i. > > > > > Here is the PCI tree: > > > > > > -+-[0000:00]-+-00.0 Intel Corporation Device 9a04 > > > | +-02.0 Intel Corporation Tiger Lake-LP GT2 [UHD Graphics G4] > > > | +-04.0 Intel Corporation TigerLake-LP Dynamic Tuning > > > Processor Participant > > > | +-06.0 Intel Corporation RST VMD Managed Controller > > > | +-07.0-[01-2b]-- > > > | +-08.0 Intel Corporation GNA Scoring Accelerator module > > > | +-0a.0 Intel Corporation Tigerlake Telemetry Aggregator Driver > > > | +-0d.0 Intel Corporation Tiger Lake-LP Thunderbolt 4 USB > > > Controller > > > | +-0d.2 Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0 > > > | +-0e.0 Intel Corporation Volume Management Device NVMe > > > RAID Controller > > > | +-14.0 Intel Corporation Tiger Lake-LP USB 3.2 Gen 2x1 > > > xHCI Host Controller > > > | +-14.2 Intel Corporation Tiger Lake-LP Shared SRAM > > > | +-14.3 Intel Corporation Wi-Fi 6 AX201 > > > | +-15.0 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #0 > > > | +-15.1 Intel Corporation Tiger Lake-LP Serial IO I2C Controller #1 > > > | +-16.0 Intel Corporation Tiger Lake-LP Management Engine Interface > > > | +-17.0 Intel Corporation RST VMD Managed Controller > > > | +-1f.0 Intel Corporation Tiger Lake-LP LPC Controller > > > | +-1f.3 Intel Corporation Tiger Lake-LP Smart Sound > > > Technology Audio Controller > > > | +-1f.4 Intel Corporation Tiger Lake-LP SMBus Controller > > > | +-1f.5 Intel Corporation Tiger Lake-LP SPI Controller > > > | \-1f.6 Intel Corporation Ethernet Connection (13) I219-V > > > \-[10000:e0]-+-06.0-[e1]----00.0 Sandisk Corp WD Blue SN550 NVMe SSD > > > \-17.0 Intel Corporation Tiger Lake-LP SATA Controller > > > > > > According the findings above, to ensure the devices on the VMD bus > > > have correctly states, seems pci_save_state() all the devices before > > > pci_reset_bus(), and pci_restore_state() all the devices after > > > pci_reset_bus() is the correct answer. > > I add some debug messages based on v6.11 as following: > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > index cee2365e54b8..404ce92f0152 100644 > --- a/drivers/pci/pcie/aspm.c > +++ b/drivers/pci/pcie/aspm.c > @@ -109,21 +109,28 @@ void pci_restore_aspm_l1ss_state(struct pci_dev *pdev) > u32 cl_ctl1, cl_ctl2, cl_l1_2_enable; > u16 clnkctl, plnkctl; > > + pci_info(pdev, "%s", __func__); > /* > * In case BIOS enabled L1.2 when resuming, we need to disable it first > * on the downstream component before the upstream. So, don't attempt to > * restore either until we are at the downstream component. > */ > - if (pcie_downstream_port(pdev) || !parent) > + if (pcie_downstream_port(pdev) || !parent) { > + pci_info(pdev, "%s: %s", __func__, > pcie_downstream_port(pdev) ? "is downstream port" : "not parent"); > return; > + } > > - if (!pdev->l1ss || !parent->l1ss) > + if (!pdev->l1ss || !parent->l1ss) { > + pci_info(pdev, "%s: %sdoes not have l1ss", __func__, > !pdev->l1ss ? "" : "parent "); > return; > + } > > cl_save_state = pci_find_saved_ext_cap(pdev, PCI_EXT_CAP_ID_L1SS); > pl_save_state = pci_find_saved_ext_cap(parent, PCI_EXT_CAP_ID_L1SS); > - if (!cl_save_state || !pl_save_state) > + if (!cl_save_state || !pl_save_state) { > + pci_info(pdev, "%s: %sdid not save ext_cap", __func__, > !cl_save_state ? "" : "parent "); > return; > + } > > cap = &cl_save_state->cap.data[0]; > cl_ctl2 = *cap++; > @@ -131,6 +138,7 @@ void pci_restore_aspm_l1ss_state(struct pci_dev *pdev) > cap = &pl_save_state->cap.data[0]; > pl_ctl2 = *cap++; > pl_ctl1 = *cap; > + pci_info(pdev, "%s: cl_ctl1: 0x%08x, cl_ctl2: 0x%08x, pl_ctl1: > 0x%08x, pl_ctl2: 0x%08x", __func__, cl_ctl1, cl_ctl2, pl_ctl1, > pl_ctl2); > > /* Make sure L0s/L1 are disabled before updating L1SS config */ > pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &clnkctl); > > Here is the corresponding log: > > [ 0.418931] pci 10000:e0:06.0: PCI bridge to [bus e1] > [ 0.418936] pci 10000:e0:06.0: Primary bus is hard wired to 0 > [ 0.447474] Freeing initrd memory: 24700K > [ 0.670789] pci 10000:e1:00.0: pci_restore_aspm_l1ss_state > [ 0.670807] pci 10000:e1:00.0: pci_restore_aspm_l1ss_state: > cl_ctl1: 0x40630000, cl_ctl2: 0x00000029, pl_ctl1: 0x00000000, > pl_ctl2: 0x00000000 > [ 0.670862] pci 10000:e0:06.0: bridge window [mem > 0x76000000-0x760fffff]: assigned > [ 0.670864] pci 10000:e0:17.0: BAR 0 [mem 0x76100000-0x76101fff]: assigned > [ 0.670881] pci 10000:e0:06.0: bridge window [io size 0x1000]: > can't assign; no space > [ 0.670882] pci 10000:e0:06.0: bridge window [io size 0x1000]: > failed to assign > [ 0.670884] pci 10000:e0:17.0: BAR 5 [mem 0x76102000-0x761027ff]: assigned > [ 0.670893] pci 10000:e0:17.0: BAR 1 [mem 0x76102800-0x761028ff]: assigned > [ 0.670896] pci 10000:e0:17.0: BAR 4 [io size 0x0020]: can't > assign; no space > [ 0.670897] pci 10000:e0:17.0: BAR 4 [io size 0x0020]: failed to assign > [ 0.670898] pci 10000:e0:17.0: BAR 2 [io size 0x0008]: can't > assign; no space > [ 0.670898] pci 10000:e0:17.0: BAR 2 [io size 0x0008]: failed to assign > [ 0.670899] pci 10000:e0:17.0: BAR 3 [io size 0x0004]: can't > assign; no space > [ 0.670900] pci 10000:e0:17.0: BAR 3 [io size 0x0004]: failed to assign > [ 0.670901] pci 10000:e1:00.0: BAR 0 [mem 0x76000000-0x76003fff > 64bit]: assigned > [ 0.670909] pci 10000:e1:00.0: BAR 4 [mem 0x76004000-0x760040ff > 64bit]: assigned > [ 0.670918] pci 10000:e0:06.0: PCI bridge to [bus e1] > [ 0.670921] pci 10000:e0:06.0: bridge window [mem 0x76000000-0x760fffff] > [ 0.670950] pci 10000:e1:00.0: VMD: Default LTR value set by driver > [ 0.670977] pcieport 10000:e0:06.0: can't derive routing for PCI INT D > [ 0.670979] pcieport 10000:e0:06.0: PCI INT D: no GSI > [ 0.671043] pcieport 10000:e0:06.0: PME: Signaling with IRQ 143 > [ 0.671092] vmd 0000:00:0e.0: Bound to PCI domain 10000 > > We can notice both pl_ctl1 and pl_ctl2 are 0x0. Because, the link's > parent device (PCIe bridge 10000:e0:06.0) did not save state before > reset. So, pci_restore_aspm_l1ss_state() restores parent's > LTR1.2_Threshold with a wrong value 0. Okay, this is very useful finding because it leads us to the root cause, which is the disparity between pci_save_aspm_l1ss_state() and pci_restore_aspm_l1ss_state(). The latter restores both dev and its parent, yet the save state side only saves the ASPM state for the dev itself, not for the parent.
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index 5db1044c9895..55ff1d26fcea 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -1411,9 +1411,15 @@ EXPORT_SYMBOL(pci_disable_link_state); static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked) { struct pcie_link_state *link = pcie_aspm_get_link(pdev); + u32 parent_l1ss_cap, child_l1ss_cap; + struct pci_dev *parent, *child; if (!link) return -EINVAL; + + parent = link->pdev; + child = link->downstream; + /* * A driver requested that ASPM be enabled on this device, but * if we don't have permission to manage ASPM (e.g., on ACPI @@ -1428,6 +1434,15 @@ static int __pci_enable_link_state(struct pci_dev *pdev, int state, bool locked) if (!locked) down_read(&pci_bus_sem); mutex_lock(&aspm_lock); + /* + * Ensure L1.2 parameters: Common_Mode_Restore_Times, T_POWER_ON and + * LTR_L1.2_THRESHOLD are programmed properly before enable bits for + * L1.2, per PCIe r6.0, sec 5.5.4. + */ + parent_l1ss_cap = aspm_get_l1ss_cap(parent); + child_l1ss_cap = aspm_get_l1ss_cap(child); + aspm_calc_l12_info(link, parent_l1ss_cap, child_l1ss_cap); + link->aspm_default = pci_calc_aspm_enable_mask(state); pcie_config_aspm_link(link, policy_to_aspm_state(link));
Currently, when enable link's L1.2 features with __pci_enable_link_state(), it configs the link directly without ensuring related L1.2 parameters, such as T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD have been programmed. This leads the link's L1.2 between PCIe Root Port and child device gets wrong configs when a caller tries to enabled it. Here is a failed example on ASUS B1400CEAE with enabled VMD: 10000:e0:06.0 PCI bridge: Intel Corporation 11th Gen Core Processor PCIe Controller (rev 01) (prog-if 00 [Normal decode]) ... Capabilities: [200 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ PortCommonModeRestoreTime=45us PortTPowerOnTime=50us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=45us LTR1.2_Threshold=101376ns L1SubCtl2: T_PwrOn=50us 10000:e1:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 NVMe SSD (rev 01) (prog-if 02 [NVM Express]) ... Capabilities: [900 v1] L1 PM Substates L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+ PortCommonModeRestoreTime=32us PortTPowerOnTime=10us L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- T_CommonMode=0us LTR1.2_Threshold=0ns L1SubCtl2: T_PwrOn=10us According to "PCIe r6.0, sec 5.5.4", before enabling ASPM L1.2 on the PCIe Root Port and the child NVMe, they should be programmed with the same LTR1.2_Threshold value. However, they have different values in this case. Invoke aspm_calc_l12_info() to program the L1.2 parameters properly before enable L1.2 bits of L1 PM Substates Control Register in __pci_enable_link_state(). Link: https://bugzilla.kernel.org/show_bug.cgi?id=218394 Signed-off-by: Jian-Hong Pan <jhp@endlessos.org> --- v2: - Prepare the PCIe LTR parameters before enable L1 Substates v3: - Only enable supported features for the L1 Substates part v4: - Focus on fixing L1.2 parameters, instead of re-initializing whole L1SS v5: - Fix typo and commit message - Split introducing aspm_get_l1ss_cap() to "PCI/ASPM: Introduce aspm_get_l1ss_cap()" v6: - Skipped v7: - Pick back and rebase on the new version kernel - Drop the link state flag check. And, always config link state's timing parameters v8: - Because pcie_aspm_get_link() might return the link as NULL, move getting the link's parent and child devices after check the link is not NULL. This avoids NULL memory access. drivers/pci/pcie/aspm.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)