Message ID | 5748066.MhkbZ0Pkbq@kreacher (mailing list archive) |
---|---|
State | Accepted |
Commit | f0881d38c7eca3351f59d551604aaf74283c2e13 |
Headers | show |
Series | PCI/PM: Rework powering up PCI devices | expand |
On Thu, May 05, 2022 at 08:10:43PM +0200, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Make pci_power_up() write 0 to the device's PCI_PM_CTRL register in > order to put it into D0 regardless of the power state returned by > the previous read from that register which should not matter. > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > drivers/pci/pci.c | 11 +++-------- > 1 file changed, 3 insertions(+), 8 deletions(-) > > Index: linux-pm/drivers/pci/pci.c > =================================================================== > --- linux-pm.orig/drivers/pci/pci.c > +++ linux-pm/drivers/pci/pci.c > @@ -1230,15 +1230,10 @@ int pci_power_up(struct pci_dev *dev) > } > > /* > - * If we're (effectively) in D3, force entire word to 0. This doesn't > - * affect PME_Status, disables PME_En, and sets PowerState to 0. > + * Force the entire word to 0. This doesn't affect PME_Status, disables > + * PME_En, and sets PowerState to 0. > */ > - if (state == PCI_D3hot) > - pmcsr = 0; > - else > - pmcsr &= ~PCI_PM_CTRL_STATE_MASK; > - > - pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); > + pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0); Can you reassure me why this is safe and useful? This is a 16-bit write that includes (PCIe r6.0, sec 7.5.2.2): 0x0003 PowerState RW 0x0004 RsvdP 0x0008 No_Soft_Reset RO 0x00f0 RsvdP 0x0100 PME_En RW/RWS 0x1e00 Data_Select RW, VF ROZ 0x6000 Data_Scale RO, VF ROZ 0x8000 PME_Status RW1CS We intend to set PowerState to 0 (D0), apparently intend to clear PME_En, and PME_Status is "write 1 to clear" to writing 0 does nothing, so those look OK. But the RsvdP fields are reserved for future RW bits and should be preserved, and it looks like clearing Data_Select could potentially break the Data Register power consumption reporting (which I don't think we support today). It seems like maybe we should do this instead: pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr & ~PCI_PM_CTRL_STATE_MASK) to just unconditionally clear PowerState?
On Thu, May 26, 2022 at 11:54:22AM -0500, Bjorn Helgaas wrote: > On Thu, May 05, 2022 at 08:10:43PM +0200, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Make pci_power_up() write 0 to the device's PCI_PM_CTRL register in > > order to put it into D0 regardless of the power state returned by > > the previous read from that register which should not matter. > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > drivers/pci/pci.c | 11 +++-------- > > 1 file changed, 3 insertions(+), 8 deletions(-) > > > > Index: linux-pm/drivers/pci/pci.c > > =================================================================== > > --- linux-pm.orig/drivers/pci/pci.c > > +++ linux-pm/drivers/pci/pci.c > > @@ -1230,15 +1230,10 @@ int pci_power_up(struct pci_dev *dev) > > } > > > > /* > > - * If we're (effectively) in D3, force entire word to 0. This doesn't > > - * affect PME_Status, disables PME_En, and sets PowerState to 0. > > + * Force the entire word to 0. This doesn't affect PME_Status, disables > > + * PME_En, and sets PowerState to 0. > > */ > > - if (state == PCI_D3hot) > > - pmcsr = 0; > > - else > > - pmcsr &= ~PCI_PM_CTRL_STATE_MASK; > > - > > - pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); > > + pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0); > > Can you reassure me why this is safe and useful? > > This is a 16-bit write that includes (PCIe r6.0, sec 7.5.2.2): > > 0x0003 PowerState RW > 0x0004 RsvdP > 0x0008 No_Soft_Reset RO > 0x00f0 RsvdP > 0x0100 PME_En RW/RWS > 0x1e00 Data_Select RW, VF ROZ > 0x6000 Data_Scale RO, VF ROZ > 0x8000 PME_Status RW1CS > > We intend to set PowerState to 0 (D0), apparently intend to clear > PME_En, and PME_Status is "write 1 to clear" to writing 0 does > nothing, so those look OK. > > But the RsvdP fields are reserved for future RW bits and should be > preserved, and it looks like clearing Data_Select could potentially > break the Data Register power consumption reporting (which I don't > think we support today). > > It seems like maybe we should do this instead: > > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, > pmcsr & ~PCI_PM_CTRL_STATE_MASK) > > to just unconditionally clear PowerState? Or I guess this, since we want to clear PME_En as well? pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr & ~(PCI_PM_CTRL_STATE_MASK | PCI_PM_CTRL_PME_ENABLE));
On Thu, May 26, 2022 at 9:46 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Thu, May 26, 2022 at 11:54:22AM -0500, Bjorn Helgaas wrote: > > On Thu, May 05, 2022 at 08:10:43PM +0200, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > Make pci_power_up() write 0 to the device's PCI_PM_CTRL register in > > > order to put it into D0 regardless of the power state returned by > > > the previous read from that register which should not matter. > > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > --- > > > drivers/pci/pci.c | 11 +++-------- > > > 1 file changed, 3 insertions(+), 8 deletions(-) > > > > > > Index: linux-pm/drivers/pci/pci.c > > > =================================================================== > > > --- linux-pm.orig/drivers/pci/pci.c > > > +++ linux-pm/drivers/pci/pci.c > > > @@ -1230,15 +1230,10 @@ int pci_power_up(struct pci_dev *dev) > > > } > > > > > > /* > > > - * If we're (effectively) in D3, force entire word to 0. This doesn't > > > - * affect PME_Status, disables PME_En, and sets PowerState to 0. > > > + * Force the entire word to 0. This doesn't affect PME_Status, disables > > > + * PME_En, and sets PowerState to 0. > > > */ > > > - if (state == PCI_D3hot) > > > - pmcsr = 0; > > > - else > > > - pmcsr &= ~PCI_PM_CTRL_STATE_MASK; > > > - > > > - pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); > > > + pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0); > > > > Can you reassure me why this is safe and useful? > > > > This is a 16-bit write that includes (PCIe r6.0, sec 7.5.2.2): > > > > 0x0003 PowerState RW > > 0x0004 RsvdP > > 0x0008 No_Soft_Reset RO > > 0x00f0 RsvdP > > 0x0100 PME_En RW/RWS > > 0x1e00 Data_Select RW, VF ROZ > > 0x6000 Data_Scale RO, VF ROZ > > 0x8000 PME_Status RW1CS > > > > We intend to set PowerState to 0 (D0), apparently intend to clear > > PME_En, and PME_Status is "write 1 to clear" to writing 0 does > > nothing, so those look OK. > > > > But the RsvdP fields are reserved for future RW bits and should be > > preserved, and it looks like clearing Data_Select could potentially > > break the Data Register power consumption reporting (which I don't > > think we support today). > > > > It seems like maybe we should do this instead: > > > > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, > > pmcsr & ~PCI_PM_CTRL_STATE_MASK) > > > > to just unconditionally clear PowerState? > > Or I guess this, since we want to clear PME_En as well? > > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr & > ~(PCI_PM_CTRL_STATE_MASK | PCI_PM_CTRL_PME_ENABLE)); Yes. Also, this patch actually only makes a difference if the device is going into D0 from D1 or D2, because we have always written 0 to the PMCSR during transitions from D3hot. It is inconsistent and confusing to do different things depending on the initial power state here and the code is simpler when 0 is written regardless.
On Fri, May 27, 2022 at 08:52:17PM +0200, Rafael J. Wysocki wrote: > On Thu, May 26, 2022 at 9:46 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > On Thu, May 26, 2022 at 11:54:22AM -0500, Bjorn Helgaas wrote: > > > On Thu, May 05, 2022 at 08:10:43PM +0200, Rafael J. Wysocki wrote: > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > Make pci_power_up() write 0 to the device's PCI_PM_CTRL register in > > > > order to put it into D0 regardless of the power state returned by > > > > the previous read from that register which should not matter. > > > > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > --- > > > > drivers/pci/pci.c | 11 +++-------- > > > > 1 file changed, 3 insertions(+), 8 deletions(-) > > > > > > > > Index: linux-pm/drivers/pci/pci.c > > > > =================================================================== > > > > --- linux-pm.orig/drivers/pci/pci.c > > > > +++ linux-pm/drivers/pci/pci.c > > > > @@ -1230,15 +1230,10 @@ int pci_power_up(struct pci_dev *dev) > > > > } > > > > > > > > /* > > > > - * If we're (effectively) in D3, force entire word to 0. This doesn't > > > > - * affect PME_Status, disables PME_En, and sets PowerState to 0. > > > > + * Force the entire word to 0. This doesn't affect PME_Status, disables > > > > + * PME_En, and sets PowerState to 0. > > > > */ > > > > - if (state == PCI_D3hot) > > > > - pmcsr = 0; > > > > - else > > > > - pmcsr &= ~PCI_PM_CTRL_STATE_MASK; > > > > - > > > > - pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); > > > > + pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0); > > > > > > Can you reassure me why this is safe and useful? > > > > > > This is a 16-bit write that includes (PCIe r6.0, sec 7.5.2.2): > > > > > > 0x0003 PowerState RW > > > 0x0004 RsvdP > > > 0x0008 No_Soft_Reset RO > > > 0x00f0 RsvdP > > > 0x0100 PME_En RW/RWS > > > 0x1e00 Data_Select RW, VF ROZ > > > 0x6000 Data_Scale RO, VF ROZ > > > 0x8000 PME_Status RW1CS > > > > > > We intend to set PowerState to 0 (D0), apparently intend to clear > > > PME_En, and PME_Status is "write 1 to clear" to writing 0 does > > > nothing, so those look OK. > > > > > > But the RsvdP fields are reserved for future RW bits and should be > > > preserved, and it looks like clearing Data_Select could potentially > > > break the Data Register power consumption reporting (which I don't > > > think we support today). > > > > > > It seems like maybe we should do this instead: > > > > > > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, > > > pmcsr & ~PCI_PM_CTRL_STATE_MASK) > > > > > > to just unconditionally clear PowerState? > > > > Or I guess this, since we want to clear PME_En as well? > > > > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr & > > ~(PCI_PM_CTRL_STATE_MASK | PCI_PM_CTRL_PME_ENABLE)); > > Yes. > > Also, this patch actually only makes a difference if the device is > going into D0 from D1 or D2, because we have always written 0 to the > PMCSR during transitions from D3hot. > > It is inconsistent and confusing to do different things depending on > the initial power state here and the code is simpler when 0 is written > regardless. I agree that depending on the initial power state is confusing (it confused me :)). What would you think of replacing this patch with the one below? commit defde70748bc ("PCI/PM: Always put device in D0 and disable PME in pci_power_up()") Author: Bjorn Helgaas <bhelgaas@google.com> Date: Fri May 27 17:45:07 2022 -0500 PCI/PM: Always put device in D0 and disable PME in pci_power_up() Unconditionally put the device in PCI_D0 and disable PME generation in pci_power_up(), regardless of the power state returned by the previous read from PCI_PM_CTRL, which should not matter. Based-on-patch-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/5748066.MhkbZ0Pkbq@kreacher Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index a5b93f85377a..8e42a9dc1944 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1229,14 +1229,9 @@ int pci_power_up(struct pci_dev *dev) goto end; } - /* - * If we're (effectively) in D3, force entire word to 0. This doesn't - * affect PME_Status, disables PME_En, and sets PowerState to 0. - */ - if (state == PCI_D3hot) - pmcsr = 0; - else - pmcsr &= ~PCI_PM_CTRL_STATE_MASK; + /* Set PowerState to 0 (PCI_D0) and disable PME generation */ + pmcsr &= ~PCI_PM_CTRL_STATE_MASK; + pmcsr &= ~PCI_PM_CTRL_PME_ENABLE; pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr);
On Fri, May 27, 2022 at 05:51:48PM -0500, Bjorn Helgaas wrote: > On Fri, May 27, 2022 at 08:52:17PM +0200, Rafael J. Wysocki wrote: > > On Thu, May 26, 2022 at 9:46 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > On Thu, May 26, 2022 at 11:54:22AM -0500, Bjorn Helgaas wrote: > > > > On Thu, May 05, 2022 at 08:10:43PM +0200, Rafael J. Wysocki wrote: > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > > > Make pci_power_up() write 0 to the device's PCI_PM_CTRL register in > > > > > order to put it into D0 regardless of the power state returned by > > > > > the previous read from that register which should not matter. > > > > > > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > --- > > > > > drivers/pci/pci.c | 11 +++-------- > > > > > 1 file changed, 3 insertions(+), 8 deletions(-) > > > > > > > > > > Index: linux-pm/drivers/pci/pci.c > > > > > =================================================================== > > > > > --- linux-pm.orig/drivers/pci/pci.c > > > > > +++ linux-pm/drivers/pci/pci.c > > > > > @@ -1230,15 +1230,10 @@ int pci_power_up(struct pci_dev *dev) > > > > > } > > > > > > > > > > /* > > > > > - * If we're (effectively) in D3, force entire word to 0. This doesn't > > > > > - * affect PME_Status, disables PME_En, and sets PowerState to 0. > > > > > + * Force the entire word to 0. This doesn't affect PME_Status, disables > > > > > + * PME_En, and sets PowerState to 0. > > > > > */ > > > > > - if (state == PCI_D3hot) > > > > > - pmcsr = 0; > > > > > - else > > > > > - pmcsr &= ~PCI_PM_CTRL_STATE_MASK; > > > > > - > > > > > - pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); > > > > > + pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0); > > > > > > > > Can you reassure me why this is safe and useful? > > > > > > > > This is a 16-bit write that includes (PCIe r6.0, sec 7.5.2.2): > > > > > > > > 0x0003 PowerState RW > > > > 0x0004 RsvdP > > > > 0x0008 No_Soft_Reset RO > > > > 0x00f0 RsvdP > > > > 0x0100 PME_En RW/RWS > > > > 0x1e00 Data_Select RW, VF ROZ > > > > 0x6000 Data_Scale RO, VF ROZ > > > > 0x8000 PME_Status RW1CS > > > > > > > > We intend to set PowerState to 0 (D0), apparently intend to clear > > > > PME_En, and PME_Status is "write 1 to clear" to writing 0 does > > > > nothing, so those look OK. > > > > > > > > But the RsvdP fields are reserved for future RW bits and should be > > > > preserved, and it looks like clearing Data_Select could potentially > > > > break the Data Register power consumption reporting (which I don't > > > > think we support today). > > > > > > > > It seems like maybe we should do this instead: > > > > > > > > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, > > > > pmcsr & ~PCI_PM_CTRL_STATE_MASK) > > > > > > > > to just unconditionally clear PowerState? > > > > > > Or I guess this, since we want to clear PME_En as well? > > > > > > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr & > > > ~(PCI_PM_CTRL_STATE_MASK | PCI_PM_CTRL_PME_ENABLE)); > > > > Yes. > > > > Also, this patch actually only makes a difference if the device is > > going into D0 from D1 or D2, because we have always written 0 to the > > PMCSR during transitions from D3hot. > > > > It is inconsistent and confusing to do different things depending on > > the initial power state here and the code is simpler when 0 is written > > regardless. > > I agree that depending on the initial power state is confusing (it > confused me :)). > > What would you think of replacing this patch with the one below? Well, I don't know why I sent this, since I had already sent the pull request. Not thinking clearly, I guess. Anyway, your original patch is now upstream. Sorry for the distraction. Bjorn
On Sat, May 28, 2022 at 1:09 AM Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, May 27, 2022 at 05:51:48PM -0500, Bjorn Helgaas wrote: > > On Fri, May 27, 2022 at 08:52:17PM +0200, Rafael J. Wysocki wrote: > > > On Thu, May 26, 2022 at 9:46 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > > > > On Thu, May 26, 2022 at 11:54:22AM -0500, Bjorn Helgaas wrote: > > > > > On Thu, May 05, 2022 at 08:10:43PM +0200, Rafael J. Wysocki wrote: > > > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > > > > > Make pci_power_up() write 0 to the device's PCI_PM_CTRL register in > > > > > > order to put it into D0 regardless of the power state returned by > > > > > > the previous read from that register which should not matter. > > > > > > > > > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > --- > > > > > > drivers/pci/pci.c | 11 +++-------- > > > > > > 1 file changed, 3 insertions(+), 8 deletions(-) > > > > > > > > > > > > Index: linux-pm/drivers/pci/pci.c > > > > > > =================================================================== > > > > > > --- linux-pm.orig/drivers/pci/pci.c > > > > > > +++ linux-pm/drivers/pci/pci.c > > > > > > @@ -1230,15 +1230,10 @@ int pci_power_up(struct pci_dev *dev) > > > > > > } > > > > > > > > > > > > /* > > > > > > - * If we're (effectively) in D3, force entire word to 0. This doesn't > > > > > > - * affect PME_Status, disables PME_En, and sets PowerState to 0. > > > > > > + * Force the entire word to 0. This doesn't affect PME_Status, disables > > > > > > + * PME_En, and sets PowerState to 0. > > > > > > */ > > > > > > - if (state == PCI_D3hot) > > > > > > - pmcsr = 0; > > > > > > - else > > > > > > - pmcsr &= ~PCI_PM_CTRL_STATE_MASK; > > > > > > - > > > > > > - pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); > > > > > > + pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0); > > > > > > > > > > Can you reassure me why this is safe and useful? > > > > > > > > > > This is a 16-bit write that includes (PCIe r6.0, sec 7.5.2.2): > > > > > > > > > > 0x0003 PowerState RW > > > > > 0x0004 RsvdP > > > > > 0x0008 No_Soft_Reset RO > > > > > 0x00f0 RsvdP > > > > > 0x0100 PME_En RW/RWS > > > > > 0x1e00 Data_Select RW, VF ROZ > > > > > 0x6000 Data_Scale RO, VF ROZ > > > > > 0x8000 PME_Status RW1CS > > > > > > > > > > We intend to set PowerState to 0 (D0), apparently intend to clear > > > > > PME_En, and PME_Status is "write 1 to clear" to writing 0 does > > > > > nothing, so those look OK. > > > > > > > > > > But the RsvdP fields are reserved for future RW bits and should be > > > > > preserved, and it looks like clearing Data_Select could potentially > > > > > break the Data Register power consumption reporting (which I don't > > > > > think we support today). > > > > > > > > > > It seems like maybe we should do this instead: > > > > > > > > > > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, > > > > > pmcsr & ~PCI_PM_CTRL_STATE_MASK) > > > > > > > > > > to just unconditionally clear PowerState? > > > > > > > > Or I guess this, since we want to clear PME_En as well? > > > > > > > > pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr & > > > > ~(PCI_PM_CTRL_STATE_MASK | PCI_PM_CTRL_PME_ENABLE)); > > > > > > Yes. > > > > > > Also, this patch actually only makes a difference if the device is > > > going into D0 from D1 or D2, because we have always written 0 to the > > > PMCSR during transitions from D3hot. > > > > > > It is inconsistent and confusing to do different things depending on > > > the initial power state here and the code is simpler when 0 is written > > > regardless. > > > > I agree that depending on the initial power state is confusing (it > > confused me :)). > > > > What would you think of replacing this patch with the one below? > > Well, I don't know why I sent this, since I had already sent the pull > request. Not thinking clearly, I guess. Anyway, your original patch > is now upstream. Sorry for the distraction. No biggie. If it turns out to be problematic, it can be changed to preserving the reserved bits easily enough.
Index: linux-pm/drivers/pci/pci.c =================================================================== --- linux-pm.orig/drivers/pci/pci.c +++ linux-pm/drivers/pci/pci.c @@ -1230,15 +1230,10 @@ int pci_power_up(struct pci_dev *dev) } /* - * If we're (effectively) in D3, force entire word to 0. This doesn't - * affect PME_Status, disables PME_En, and sets PowerState to 0. + * Force the entire word to 0. This doesn't affect PME_Status, disables + * PME_En, and sets PowerState to 0. */ - if (state == PCI_D3hot) - pmcsr = 0; - else - pmcsr &= ~PCI_PM_CTRL_STATE_MASK; - - pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, pmcsr); + pci_write_config_word(dev, dev->pm_cap + PCI_PM_CTRL, 0); /* Mandatory transition delays; see PCI PM 1.2. */ if (state == PCI_D3hot)