diff mbox series

[v2] PCI/ASPM: Disable L1 before disabling L1ss

Message ID 20241003132503.2279433-1-ajayagarwal@google.com (mailing list archive)
State Superseded
Headers show
Series [v2] PCI/ASPM: Disable L1 before disabling L1ss | expand

Commit Message

Ajay Agarwal Oct. 3, 2024, 1:25 p.m. UTC
The current sequence in the driver for L1ss update is as follows.

Disable L1ss
Disable L1
Enable L1ss as required
Enable L1 if required

With this sequence, a bus hang is observed during the L1ss
disable sequence when the RC CPU attempts to clear the RC L1ss
register after clearing the EP L1ss register. It looks like the
RC attempts to enter L1ss again and at the same time, access to
RC L1ss register fails because aux clk is still not active.

PCIe spec r6.2, section 5.5.4, recommends that setting either
or both of the enable bits for ASPM L1 PM Substates must be done
while ASPM L1 is disabled. My interpretation here is that
clearing L1ss should also be done when L1 is disabled. Thereby,
change the sequence as follows.

Disable L1
Disable L1ss
Enable L1ss as required
Enable L1 if required

Signed-off-by: Ajay Agarwal <ajayagarwal@google.com>
---
 drivers/pci/pcie/aspm.c | 50 ++++++++++++++++++++---------------------
 1 file changed, 24 insertions(+), 26 deletions(-)

Comments

Bjorn Helgaas Oct. 3, 2024, 5:01 p.m. UTC | #1
On Thu, Oct 03, 2024 at 06:55:03PM +0530, Ajay Agarwal wrote:
> The current sequence in the driver for L1ss update is as follows.
> 
> Disable L1ss
> Disable L1
> Enable L1ss as required
> Enable L1 if required
> 
> With this sequence, a bus hang is observed during the L1ss
> disable sequence when the RC CPU attempts to clear the RC L1ss
> register after clearing the EP L1ss register.

Thanks for this.  What exactly does the bus hang look like to a user?

I guess the problem happens in pcie_config_aspm_l1ss(), where we do:

  pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
  pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)

where clearing the child (endpoint) PCI_L1SS_CTL1_L1_2_MASK works, but
something goes wrong when clearing the parent (RP) mask?  The
clear_and_set will do a read followed by a write, and one of those
causes some kind of error?

> It looks like the
> RC attempts to enter L1ss again and at the same time, access to
> RC L1ss register fails because aux clk is still not active.

I assume "access to RC L1ss register fails" means something like
"reading the Root Port PCI_L1SS_CTL1 register returns ~0" which I
guess would be the read part of the pci_clear_and_set_config_dword()?

~0 data might be returned because of some PCIe error like Unsupported
Request, Completion Timeout, etc?  Such an error should be logged in
the AER Capability.

This *sounds* like it would be a hardware defect in the Root Port.
This register is on the upstream end of the link, so I would think it
would be readable no matter what state the link is in.

Sec 5.5.4 requires that L1 be disabled in PCI_EXP_LNKCTL while
*setting* either of the ASPM L1 PM Substates enable bits.  I don't see
anything there about requiring that for *clearing* those enable bits.
But maybe it is required, and in any event I guess it's simpler to do
it as you do here and have L1 (indeed *all* ASPM) disabled while
configuring L1 SS.

> PCIe spec r6.2, section 5.5.4, recommends that setting either
> or both of the enable bits for ASPM L1 PM Substates must be done
> while ASPM L1 is disabled. My interpretation here is that
> clearing L1ss should also be done when L1 is disabled. Thereby,
> change the sequence as follows.
> 
> Disable L1
> Disable L1ss
> Enable L1ss as required
> Enable L1 if required
> 
> Signed-off-by: Ajay Agarwal <ajayagarwal@google.com>
> ---
>  drivers/pci/pcie/aspm.c | 50 ++++++++++++++++++++---------------------
>  1 file changed, 24 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index cee2365e54b8..c172886129f3 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -848,17 +848,13 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
>  /* Configure the ASPM L1 substates */
>  static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>  {
> -	u32 val, enable_req;
> +	u32 val;
>  	struct pci_dev *child = link->downstream, *parent = link->pdev;
>  
> -	enable_req = (link->aspm_enabled ^ state) & state;
> -
>  	/*
> -	 * Here are the rules specified in the PCIe spec for enabling L1SS:
> +	 * Spec r6.2, section 5.5.4, mentions the rules for enabling L1SS:
>  	 * - When enabling L1.x, enable bit at parent first, then at child
>  	 * - When disabling L1.x, disable bit at child first, then at parent
> -	 * - When enabling ASPM L1.x, need to disable L1
> -	 *   (at child followed by parent).
>  	 * - The ASPM/PCIPM L1.2 must be disabled while programming timing
>  	 *   parameters
>  	 *
> @@ -871,16 +867,6 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>  				       PCI_L1SS_CTL1_L1SS_MASK, 0);
>  	pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
>  				       PCI_L1SS_CTL1_L1SS_MASK, 0);
> -	/*
> -	 * If needed, disable L1, and it gets enabled later
> -	 * in pcie_config_aspm_link().
> -	 */
> -	if (enable_req & (PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2)) {
> -		pcie_capability_clear_word(child, PCI_EXP_LNKCTL,
> -					   PCI_EXP_LNKCTL_ASPM_L1);
> -		pcie_capability_clear_word(parent, PCI_EXP_LNKCTL,
> -					   PCI_EXP_LNKCTL_ASPM_L1);
> -	}
>  
>  	val = 0;
>  	if (state & PCIE_LINK_STATE_L1_1)
> @@ -937,21 +923,33 @@ static void pcie_config_aspm_link(struct pcie_link_state *link, u32 state)
>  		dwstream |= PCI_EXP_LNKCTL_ASPM_L1;
>  	}
>  
> +	/*
> +	 * Spec r6.2, section 5.5.4, recommends that setting either or both of
> +	 * the enable bits for ASPM L1 PM Substates must be done while ASPM L1
> +	 * is disabled. So disable L1 here, and it gets enabled later after the
> +	 * L1ss configuration has been completed.
> +	 *
> +	 * Spec r6.2, section 7.5.3.7, mentions that ASPM L1 must be enabled by
> +	 * software in the Upstream component on a Link prior to enabling ASPM
> +	 * L1 in the Downstream component on the Link. When disabling L1,
> +	 * software must disable ASPM L1 in the Downstream component on a Link
> +	 * prior to disabling ASPM L1 in the Upstream component on that Link.
> +	 *
> +	 * Spec doesn't mention L0s.
> +	 *
> +	 * Disable L1 and L0s here, and they get enabled later after the L1ss
> +	 * configuration has been completed.
> +	 */
> +	list_for_each_entry(child, &linkbus->devices, bus_list)
> +		pcie_config_aspm_dev(child, 0);
> +	pcie_config_aspm_dev(parent, 0);
> +
>  	if (link->aspm_capable & PCIE_LINK_STATE_L1SS)
>  		pcie_config_aspm_l1ss(link, state);
>  
> -	/*
> -	 * Spec 2.0 suggests all functions should be configured the
> -	 * same setting for ASPM. Enabling ASPM L1 should be done in
> -	 * upstream component first and then downstream, and vice
> -	 * versa for disabling ASPM L1. Spec doesn't mention L0S.
> -	 */
> -	if (state & PCIE_LINK_STATE_L1)
> -		pcie_config_aspm_dev(parent, upstream);
> +	pcie_config_aspm_dev(parent, upstream);
>  	list_for_each_entry(child, &linkbus->devices, bus_list)
>  		pcie_config_aspm_dev(child, dwstream);
> -	if (!(state & PCIE_LINK_STATE_L1))
> -		pcie_config_aspm_dev(parent, upstream);

I think the reason for having pcie_config_aspm_dev(parent) both before
and after configuring the children is because pcie_config_aspm_link()
may be called either to enable L1 or to disable it.

I guess your change always disables ASPM completely (disabling the
downstream (child) component first, then the upstream), and here we
are either leaving L1 disabled or enabling it, and in either case it
should be safe to configure the upstream (parent) component first,
then the downstream one.

Of course, we may also enable L0s here, and AFAICS it should always be
safe to do that in the upstream component first, followed by the
downstream one.

Bottom line, this looks good to me, and I think it's nice that this
removes the "parent then child" or "child then parent" logic here.

>  	link->aspm_enabled = state;
>  
> -- 
> 2.46.1.824.gd892dcdcdd-goog
>
Ajay Agarwal Oct. 3, 2024, 5:23 p.m. UTC | #2
On Thu, Oct 03, 2024 at 12:01:22PM -0500, Bjorn Helgaas wrote:
> On Thu, Oct 03, 2024 at 06:55:03PM +0530, Ajay Agarwal wrote:
> > The current sequence in the driver for L1ss update is as follows.
> > 
> > Disable L1ss
> > Disable L1
> > Enable L1ss as required
> > Enable L1 if required
> > 
> > With this sequence, a bus hang is observed during the L1ss
> > disable sequence when the RC CPU attempts to clear the RC L1ss
> > register after clearing the EP L1ss register.
> 
> Thanks for this.  What exactly does the bus hang look like to a user?
>
The CPU is just hung on reading the RC PCI_L1SS_CTL1 register. After
some time, the CPU watchdog expires and the system reboots.

> I guess the problem happens in pcie_config_aspm_l1ss(), where we do:
> 
>   pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
>   pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
> 
> where clearing the child (endpoint) PCI_L1SS_CTL1_L1_2_MASK works, but
> something goes wrong when clearing the parent (RP) mask?  The
> clear_and_set will do a read followed by a write, and one of those
> causes some kind of error?
>
During ASPM disable, in pcie_config_aspm_l1ss(), we do:
   1. pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
   2. pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
   3. pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
   4. pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)

We observe that the steps 1 and 2 go through just fine. But the read of
PCI_L1SS_CTL1 register in the step 3 hangs. I am not sure why.
The issue is pretty difficult to reproduce, and adding prints around
these steps masks the issue.

> > It looks like the
> > RC attempts to enter L1ss again and at the same time, access to
> > RC L1ss register fails because aux clk is still not active.
> 
> I assume "access to RC L1ss register fails" means something like
> "reading the Root Port PCI_L1SS_CTL1 register returns ~0" which I
> guess would be the read part of the pci_clear_and_set_config_dword()?
> 
> ~0 data might be returned because of some PCIe error like Unsupported
> Request, Completion Timeout, etc?  Such an error should be logged in
> the AER Capability.
>
This is not a PCIe bus transaction. This is CPU on the RC side accessing
the RC side config register, so the link is not involved at all. Hence,
no timeout or other AER errors logged/reported. The AXI-DBI bus just
hangs.

> This *sounds* like it would be a hardware defect in the Root Port.
> This register is on the upstream end of the link, so I would think it
> would be readable no matter what state the link is in.
> 
Exactly. As described above, this is not a PCIe transaction.

> Sec 5.5.4 requires that L1 be disabled in PCI_EXP_LNKCTL while
> *setting* either of the ASPM L1 PM Substates enable bits.  I don't see
> anything there about requiring that for *clearing* those enable bits.
> But maybe it is required, and in any event I guess it's simpler to do
> it as you do here and have L1 (indeed *all* ASPM) disabled while
> configuring L1 SS.
> 
Right. The spec does not talk about the sequence when one wants to clear
these L1ss bits. But I am interpreting the word "setting" as "setting to
1" as well as "setting to 0".

> > PCIe spec r6.2, section 5.5.4, recommends that setting either
> > or both of the enable bits for ASPM L1 PM Substates must be done
> > while ASPM L1 is disabled. My interpretation here is that
> > clearing L1ss should also be done when L1 is disabled. Thereby,
> > change the sequence as follows.
> > 
> > Disable L1
> > Disable L1ss
> > Enable L1ss as required
> > Enable L1 if required
> > 
> > Signed-off-by: Ajay Agarwal <ajayagarwal@google.com>
> > ---
> >  drivers/pci/pcie/aspm.c | 50 ++++++++++++++++++++---------------------
> >  1 file changed, 24 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index cee2365e54b8..c172886129f3 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -848,17 +848,13 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
> >  /* Configure the ASPM L1 substates */
> >  static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >  {
> > -	u32 val, enable_req;
> > +	u32 val;
> >  	struct pci_dev *child = link->downstream, *parent = link->pdev;
> >  
> > -	enable_req = (link->aspm_enabled ^ state) & state;
> > -
> >  	/*
> > -	 * Here are the rules specified in the PCIe spec for enabling L1SS:
> > +	 * Spec r6.2, section 5.5.4, mentions the rules for enabling L1SS:
> >  	 * - When enabling L1.x, enable bit at parent first, then at child
> >  	 * - When disabling L1.x, disable bit at child first, then at parent
> > -	 * - When enabling ASPM L1.x, need to disable L1
> > -	 *   (at child followed by parent).
> >  	 * - The ASPM/PCIPM L1.2 must be disabled while programming timing
> >  	 *   parameters
> >  	 *
> > @@ -871,16 +867,6 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >  				       PCI_L1SS_CTL1_L1SS_MASK, 0);
> >  	pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
> >  				       PCI_L1SS_CTL1_L1SS_MASK, 0);
> > -	/*
> > -	 * If needed, disable L1, and it gets enabled later
> > -	 * in pcie_config_aspm_link().
> > -	 */
> > -	if (enable_req & (PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2)) {
> > -		pcie_capability_clear_word(child, PCI_EXP_LNKCTL,
> > -					   PCI_EXP_LNKCTL_ASPM_L1);
> > -		pcie_capability_clear_word(parent, PCI_EXP_LNKCTL,
> > -					   PCI_EXP_LNKCTL_ASPM_L1);
> > -	}
> >  
> >  	val = 0;
> >  	if (state & PCIE_LINK_STATE_L1_1)
> > @@ -937,21 +923,33 @@ static void pcie_config_aspm_link(struct pcie_link_state *link, u32 state)
> >  		dwstream |= PCI_EXP_LNKCTL_ASPM_L1;
> >  	}
> >  
> > +	/*
> > +	 * Spec r6.2, section 5.5.4, recommends that setting either or both of
> > +	 * the enable bits for ASPM L1 PM Substates must be done while ASPM L1
> > +	 * is disabled. So disable L1 here, and it gets enabled later after the
> > +	 * L1ss configuration has been completed.
> > +	 *
> > +	 * Spec r6.2, section 7.5.3.7, mentions that ASPM L1 must be enabled by
> > +	 * software in the Upstream component on a Link prior to enabling ASPM
> > +	 * L1 in the Downstream component on the Link. When disabling L1,
> > +	 * software must disable ASPM L1 in the Downstream component on a Link
> > +	 * prior to disabling ASPM L1 in the Upstream component on that Link.
> > +	 *
> > +	 * Spec doesn't mention L0s.
> > +	 *
> > +	 * Disable L1 and L0s here, and they get enabled later after the L1ss
> > +	 * configuration has been completed.
> > +	 */
> > +	list_for_each_entry(child, &linkbus->devices, bus_list)
> > +		pcie_config_aspm_dev(child, 0);
> > +	pcie_config_aspm_dev(parent, 0);
> > +
> >  	if (link->aspm_capable & PCIE_LINK_STATE_L1SS)
> >  		pcie_config_aspm_l1ss(link, state);
> >  
> > -	/*
> > -	 * Spec 2.0 suggests all functions should be configured the
> > -	 * same setting for ASPM. Enabling ASPM L1 should be done in
> > -	 * upstream component first and then downstream, and vice
> > -	 * versa for disabling ASPM L1. Spec doesn't mention L0S.
> > -	 */
> > -	if (state & PCIE_LINK_STATE_L1)
> > -		pcie_config_aspm_dev(parent, upstream);
> > +	pcie_config_aspm_dev(parent, upstream);
> >  	list_for_each_entry(child, &linkbus->devices, bus_list)
> >  		pcie_config_aspm_dev(child, dwstream);
> > -	if (!(state & PCIE_LINK_STATE_L1))
> > -		pcie_config_aspm_dev(parent, upstream);
> 
> I think the reason for having pcie_config_aspm_dev(parent) both before
> and after configuring the children is because pcie_config_aspm_link()
> may be called either to enable L1 or to disable it.
> 
> I guess your change always disables ASPM completely (disabling the
> downstream (child) component first, then the upstream), and here we
> are either leaving L1 disabled or enabling it, and in either case it
> should be safe to configure the upstream (parent) component first,
> then the downstream one.
> 
> Of course, we may also enable L0s here, and AFAICS it should always be
> safe to do that in the upstream component first, followed by the
> downstream one.
> 
> Bottom line, this looks good to me, and I think it's nice that this
> removes the "parent then child" or "child then parent" logic here.
> 
Agreed with all the points.

> >  	link->aspm_enabled = state;
> >  
> > -- 
> > 2.46.1.824.gd892dcdcdd-goog
> >
Bjorn Helgaas Oct. 3, 2024, 8:23 p.m. UTC | #3
On Thu, Oct 03, 2024 at 10:53:58PM +0530, Ajay Agarwal wrote:
> On Thu, Oct 03, 2024 at 12:01:22PM -0500, Bjorn Helgaas wrote:
> > On Thu, Oct 03, 2024 at 06:55:03PM +0530, Ajay Agarwal wrote:
> > > The current sequence in the driver for L1ss update is as follows.
> > > 
> > > Disable L1ss
> > > Disable L1
> > > Enable L1ss as required
> > > Enable L1 if required
> > > 
> > > With this sequence, a bus hang is observed during the L1ss
> > > disable sequence when the RC CPU attempts to clear the RC L1ss
> > > register after clearing the EP L1ss register.
> > 
> > Thanks for this.  What exactly does the bus hang look like to a user?
> >
> The CPU is just hung on reading the RC PCI_L1SS_CTL1 register. After
> some time, the CPU watchdog expires and the system reboots.

Wow.  Good to know that this is outside the PCIe domain.  I think this
is a good change, and since it is partly motivated by hardware
behavior that might be legal but seems somewhat unusual, can we
identify the hardware (CPU and PCIe Root Complex) involved here?

> > I guess the problem happens in pcie_config_aspm_l1ss(), where we do:
> > 
> >   pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
> >   pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
> > 
> > where clearing the child (endpoint) PCI_L1SS_CTL1_L1_2_MASK works, but
> > something goes wrong when clearing the parent (RP) mask?  The
> > clear_and_set will do a read followed by a write, and one of those
> > causes some kind of error?
> >
> During ASPM disable, in pcie_config_aspm_l1ss(), we do:
>    1. pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
>    2. pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
>    3. pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
>    4. pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
> 
> We observe that the steps 1 and 2 go through just fine. But the read of
> PCI_L1SS_CTL1 register in the step 3 hangs. I am not sure why.
> The issue is pretty difficult to reproduce, and adding prints around
> these steps masks the issue.

I guess the L1 disable is between 2 and 3, right?  And 3 and 4 may
enable L1 SS (using val, not 0)?

  1. same
  2. same
  2.5 pcie_capability_clear_word(child, PCI_EXP_LNKCTL_ASPM_L1)
  2.6 pcie_capability_clear_word(parent, PCI_EXP_LNKCTL_ASPM_L1)
  3. pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ...  val)
  4. pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ...  val)

Bjorn
Ajay Agarwal Oct. 4, 2024, 3 a.m. UTC | #4
On Thu, Oct 03, 2024 at 03:23:21PM -0500, Bjorn Helgaas wrote:
> On Thu, Oct 03, 2024 at 10:53:58PM +0530, Ajay Agarwal wrote:
> > On Thu, Oct 03, 2024 at 12:01:22PM -0500, Bjorn Helgaas wrote:
> > > On Thu, Oct 03, 2024 at 06:55:03PM +0530, Ajay Agarwal wrote:
> > > > The current sequence in the driver for L1ss update is as follows.
> > > > 
> > > > Disable L1ss
> > > > Disable L1
> > > > Enable L1ss as required
> > > > Enable L1 if required
> > > > 
> > > > With this sequence, a bus hang is observed during the L1ss
> > > > disable sequence when the RC CPU attempts to clear the RC L1ss
> > > > register after clearing the EP L1ss register.
> > > 
> > > Thanks for this.  What exactly does the bus hang look like to a user?
> > >
> > The CPU is just hung on reading the RC PCI_L1SS_CTL1 register. After
> > some time, the CPU watchdog expires and the system reboots.
> 
> Wow.  Good to know that this is outside the PCIe domain.  I think this
> is a good change, and since it is partly motivated by hardware
> behavior that might be legal but seems somewhat unusual, can we
> identify the hardware (CPU and PCIe Root Complex) involved here?
> 
The CPU is an ARM A-core. The PCIe RC is a Synopsys Designware core.

> > > I guess the problem happens in pcie_config_aspm_l1ss(), where we do:
> > > 
> > >   pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
> > >   pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
> > > 
> > > where clearing the child (endpoint) PCI_L1SS_CTL1_L1_2_MASK works, but
> > > something goes wrong when clearing the parent (RP) mask?  The
> > > clear_and_set will do a read followed by a write, and one of those
> > > causes some kind of error?
> > >
> > During ASPM disable, in pcie_config_aspm_l1ss(), we do:
> >    1. pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
> >    2. pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
> >    3. pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
> >    4. pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
> > 
> > We observe that the steps 1 and 2 go through just fine. But the read of
> > PCI_L1SS_CTL1 register in the step 3 hangs. I am not sure why.
> > The issue is pretty difficult to reproduce, and adding prints around
> > these steps masks the issue.
> 
> I guess the L1 disable is between 2 and 3, right?  And 3 and 4 may
> enable L1 SS (using val, not 0)?
> 
>   1. same
>   2. same
>   2.5 pcie_capability_clear_word(child, PCI_EXP_LNKCTL_ASPM_L1)
>   2.6 pcie_capability_clear_word(parent, PCI_EXP_LNKCTL_ASPM_L1)
>   3. pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ...  val)
>   4. pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ...  val)
>
Thats the sequence when L1ss is enabled. When it is disabled, then steps
2.5 and 2.6 do not run. And 'val' remains 0.

> Bjorn
Bjorn Helgaas Oct. 4, 2024, 11:19 p.m. UTC | #5
On Thu, Oct 03, 2024 at 06:55:03PM +0530, Ajay Agarwal wrote:
> The current sequence in the driver for L1ss update is as follows.
> 
> Disable L1ss
> Disable L1
> Enable L1ss as required
> Enable L1 if required
> 
> With this sequence, a bus hang is observed during the L1ss
> disable sequence when the RC CPU attempts to clear the RC L1ss
> register after clearing the EP L1ss register. It looks like the
> RC attempts to enter L1ss again and at the same time, access to
> RC L1ss register fails because aux clk is still not active.
>
> PCIe spec r6.2, section 5.5.4, recommends that setting either
> or both of the enable bits for ASPM L1 PM Substates must be done
> while ASPM L1 is disabled. My interpretation here is that
> clearing L1ss should also be done when L1 is disabled. Thereby,
> change the sequence as follows.
> 
> Disable L1
> Disable L1ss
> Enable L1ss as required
> Enable L1 if required

I think we also write the L1.2 enable bits in PCI_L1SS_CTL1 in
aspm_calc_l12_info() when ASPM L1 may be enabled:

  pcie_aspm_init_link_state
    pcie_aspm_cap_init
      pcie_capability_read_word(PCI_EXP_LNKCTL)
      aspm_l1ss_init
        aspm_calc_l12_info
          pci_clear_and_set_config_dword(PCI_L1SS_CTL1, PCI_L1SS_CTL1_L1_2_MASK)

That looks like another path where we should make a similar change.
What do you think?

Bjorn
Ajay Agarwal Oct. 7, 2024, 3:21 a.m. UTC | #6
On Fri, Oct 04, 2024 at 06:19:28PM -0500, Bjorn Helgaas wrote:
> On Thu, Oct 03, 2024 at 06:55:03PM +0530, Ajay Agarwal wrote:
> > The current sequence in the driver for L1ss update is as follows.
> > 
> > Disable L1ss
> > Disable L1
> > Enable L1ss as required
> > Enable L1 if required
> > 
> > With this sequence, a bus hang is observed during the L1ss
> > disable sequence when the RC CPU attempts to clear the RC L1ss
> > register after clearing the EP L1ss register. It looks like the
> > RC attempts to enter L1ss again and at the same time, access to
> > RC L1ss register fails because aux clk is still not active.
> >
> > PCIe spec r6.2, section 5.5.4, recommends that setting either
> > or both of the enable bits for ASPM L1 PM Substates must be done
> > while ASPM L1 is disabled. My interpretation here is that
> > clearing L1ss should also be done when L1 is disabled. Thereby,
> > change the sequence as follows.
> > 
> > Disable L1
> > Disable L1ss
> > Enable L1ss as required
> > Enable L1 if required
> 
> I think we also write the L1.2 enable bits in PCI_L1SS_CTL1 in
> aspm_calc_l12_info() when ASPM L1 may be enabled:
> 
>   pcie_aspm_init_link_state
>     pcie_aspm_cap_init
>       pcie_capability_read_word(PCI_EXP_LNKCTL)
>       aspm_l1ss_init
>         aspm_calc_l12_info
>           pci_clear_and_set_config_dword(PCI_L1SS_CTL1, PCI_L1SS_CTL1_L1_2_MASK)
> 
> That looks like another path where we should make a similar change.
> What do you think?
>
I agree. We should make a similar change there. Thanks for pointing out.
Will make the change in the next version.
> Bjorn
diff mbox series

Patch

diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index cee2365e54b8..c172886129f3 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -848,17 +848,13 @@  static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
 /* Configure the ASPM L1 substates */
 static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
 {
-	u32 val, enable_req;
+	u32 val;
 	struct pci_dev *child = link->downstream, *parent = link->pdev;
 
-	enable_req = (link->aspm_enabled ^ state) & state;
-
 	/*
-	 * Here are the rules specified in the PCIe spec for enabling L1SS:
+	 * Spec r6.2, section 5.5.4, mentions the rules for enabling L1SS:
 	 * - When enabling L1.x, enable bit at parent first, then at child
 	 * - When disabling L1.x, disable bit at child first, then at parent
-	 * - When enabling ASPM L1.x, need to disable L1
-	 *   (at child followed by parent).
 	 * - The ASPM/PCIPM L1.2 must be disabled while programming timing
 	 *   parameters
 	 *
@@ -871,16 +867,6 @@  static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
 				       PCI_L1SS_CTL1_L1SS_MASK, 0);
 	pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
 				       PCI_L1SS_CTL1_L1SS_MASK, 0);
-	/*
-	 * If needed, disable L1, and it gets enabled later
-	 * in pcie_config_aspm_link().
-	 */
-	if (enable_req & (PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2)) {
-		pcie_capability_clear_word(child, PCI_EXP_LNKCTL,
-					   PCI_EXP_LNKCTL_ASPM_L1);
-		pcie_capability_clear_word(parent, PCI_EXP_LNKCTL,
-					   PCI_EXP_LNKCTL_ASPM_L1);
-	}
 
 	val = 0;
 	if (state & PCIE_LINK_STATE_L1_1)
@@ -937,21 +923,33 @@  static void pcie_config_aspm_link(struct pcie_link_state *link, u32 state)
 		dwstream |= PCI_EXP_LNKCTL_ASPM_L1;
 	}
 
+	/*
+	 * Spec r6.2, section 5.5.4, recommends that setting either or both of
+	 * the enable bits for ASPM L1 PM Substates must be done while ASPM L1
+	 * is disabled. So disable L1 here, and it gets enabled later after the
+	 * L1ss configuration has been completed.
+	 *
+	 * Spec r6.2, section 7.5.3.7, mentions that ASPM L1 must be enabled by
+	 * software in the Upstream component on a Link prior to enabling ASPM
+	 * L1 in the Downstream component on the Link. When disabling L1,
+	 * software must disable ASPM L1 in the Downstream component on a Link
+	 * prior to disabling ASPM L1 in the Upstream component on that Link.
+	 *
+	 * Spec doesn't mention L0s.
+	 *
+	 * Disable L1 and L0s here, and they get enabled later after the L1ss
+	 * configuration has been completed.
+	 */
+	list_for_each_entry(child, &linkbus->devices, bus_list)
+		pcie_config_aspm_dev(child, 0);
+	pcie_config_aspm_dev(parent, 0);
+
 	if (link->aspm_capable & PCIE_LINK_STATE_L1SS)
 		pcie_config_aspm_l1ss(link, state);
 
-	/*
-	 * Spec 2.0 suggests all functions should be configured the
-	 * same setting for ASPM. Enabling ASPM L1 should be done in
-	 * upstream component first and then downstream, and vice
-	 * versa for disabling ASPM L1. Spec doesn't mention L0S.
-	 */
-	if (state & PCIE_LINK_STATE_L1)
-		pcie_config_aspm_dev(parent, upstream);
+	pcie_config_aspm_dev(parent, upstream);
 	list_for_each_entry(child, &linkbus->devices, bus_list)
 		pcie_config_aspm_dev(child, dwstream);
-	if (!(state & PCIE_LINK_STATE_L1))
-		pcie_config_aspm_dev(parent, upstream);
 
 	link->aspm_enabled = state;