diff mbox series

PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge

Message ID 20240130183124.19985-1-drake@endlessos.org (mailing list archive)
State Superseded
Headers show
Series PCI: Disable D3cold on Asus B1400 PCI-NVMe bridge | expand

Commit Message

Daniel Drake Jan. 30, 2024, 6:31 p.m. UTC
The Asus B1400 with production shipped firmware version 304 and VMD
disabled cannot resume from suspend: the NVMe device becomes unresponsive
and inaccessible.

This is because the NVMe device and parent PCI bridge get put into D3cold
during suspend, and this PCI bridge cannot be recovered from D3cold mode:

  echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/unbind
  echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/unbind
  setpci -s 00:06.0 CAP_PM+4.b=03 # D3hot
  acpidbg -b "execute \_SB.PC00.PEG0.PXP._OFF"
  acpidbg -b "execute \_SB.PC00.PEG0.PXP._ON"
  setpci -s 00:06.0 CAP_PM+4.b=0 # D0
  echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/bind
  echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/bind
  # NVMe probe fails here with -ENODEV

This appears to be an untested D3cold transition by the vendor; Intel
socwatch shows that Windows leaves the NVMe device and parent bridge in D0
during suspend, even though this firmware version has StorageD3Enable=1.

Experimenting with the DSDT, the _OFF method calls DL23() which sets a L23E
bit at offset 0xe2 into the PCI configuration space for this root port.
This is the specific write that the _ON routine is unable to recover from.
This register is not documented in the public chipset datasheet.

Disallow D3cold on the PCI bridge to enable successful suspend/resume.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742
Signed-off-by: Daniel Drake <drake@endlessos.org>
---
 arch/x86/pci/fixup.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

There's an existing quirk for this product that selects S3 instead of
s2idle because of this failure. However, after extensive testing,
we've found that S3 cannot reliably wake up from suspend (firmware issue).
We need to get s2idle working again; I'll revert the original quirk
once we have solved this D3cold issue.

Comments

Mario Limonciello Jan. 30, 2024, 6:47 p.m. UTC | #1
On 1/30/2024 12:31, Daniel Drake wrote:
> The Asus B1400 with production shipped firmware version 304 and VMD
> disabled cannot resume from suspend: the NVMe device becomes unresponsive
> and inaccessible.

Has there already been a check done whether any newer firmware is 
available from ASUS that doesn't suffer the deficiency described below?

> 
> This is because the NVMe device and parent PCI bridge get put into D3cold
> during suspend, and this PCI bridge cannot be recovered from D3cold mode:
> 
>    echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/unbind
>    echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/unbind
>    setpci -s 00:06.0 CAP_PM+4.b=03 # D3hot
>    acpidbg -b "execute \_SB.PC00.PEG0.PXP._OFF"
>    acpidbg -b "execute \_SB.PC00.PEG0.PXP._ON"
>    setpci -s 00:06.0 CAP_PM+4.b=0 # D0
>    echo "0000:00:06.0" > /sys/bus/pci/drivers/pcieport/bind
>    echo "0000:01:00.0" > /sys/bus/pci/drivers/nvme/bind
>    # NVMe probe fails here with -ENODEV
> 
> This appears to be an untested D3cold transition by the vendor; Intel
> socwatch shows that Windows leaves the NVMe device and parent bridge in D0
> during suspend, even though this firmware version has StorageD3Enable=1.
> 
> Experimenting with the DSDT, the _OFF method calls DL23() which sets a L23E
> bit at offset 0xe2 into the PCI configuration space for this root port.
> This is the specific write that the _ON routine is unable to recover from.
> This register is not documented in the public chipset datasheet.
> 
> Disallow D3cold on the PCI bridge to enable successful suspend/resume.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215742
> Signed-off-by: Daniel Drake <drake@endlessos.org>
> ---
>   arch/x86/pci/fixup.c | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)
> 
> There's an existing quirk for this product that selects S3 instead of
> s2idle because of this failure. However, after extensive testing,
> we've found that S3 cannot reliably wake up from suspend (firmware issue).
> We need to get s2idle working again; I'll revert the original quirk
> once we have solved this D3cold issue.

Is this the only problem blocking s2idle from working effectively on 
this platform?  If so, I would think you want to just do the revert in 
the same series if it's decided this patch needs to spin again for any 
reason.

> 
> diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
> index f347c20247d30..c486d86bb678b 100644
> --- a/arch/x86/pci/fixup.c
> +++ b/arch/x86/pci/fixup.c
> @@ -907,6 +907,32 @@ static void chromeos_fixup_apl_pci_l1ss_capability(struct pci_dev *dev)
>   DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_save_apl_pci_l1ss_capability);
>   DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_fixup_apl_pci_l1ss_capability);
>   
> +/*
> + * Disable D3cold on Asus B1400 PCIe bridge at 00:06.0.
> + *
> + * On this platform with VMD off, the NVMe's parent PCI bridge cannot
> + * successfully power back on from D3cold, resulting in unresponsive NVMe on
> + * resume. This appears to be an untested transition by the vendor: Windows
> + * leaves the NVMe and parent bridge in D0 during suspend, even though
> + * StorageD3Enable=1 is set on the production shipped firmware version 304.
> + */
> +static const struct dmi_system_id asus_nvme_broken_d3cold_table[] = {
> +	{
> +		.matches = {
> +				DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
> +				DMI_MATCH(DMI_PRODUCT_NAME, "ASUS EXPERTBOOK B1400CEAE"),
> +		},
> +	},
> +	{}
> +};
> +
> +static void asus_disable_nvme_d3cold(struct pci_dev *pdev)
> +{
> +	if (dmi_check_system(asus_nvme_broken_d3cold_table) > 0)
> +		pci_d3cold_disable(pdev);
> +}
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9a09, asus_disable_nvme_d3cold);
> +
>   #ifdef CONFIG_SUSPEND
>   /*
>    * Root Ports on some AMD SoCs advertise PME_Support for D3hot and D3cold, but
Daniel Drake Jan. 30, 2024, 7 p.m. UTC | #2
On Tue, Jan 30, 2024 at 2:47 PM Mario Limonciello
<mario.limonciello@amd.com> wrote:
> Has there already been a check done whether any newer firmware is
> available from ASUS that doesn't suffer the deficiency described below?

The latest firmware does flip StorageD3Enable to 0, which has the side
effect of never putting the NVMe device or the parent bridge into
D3cold.
However, we have shipped hundreds of these devices with the original
production BIOS version to first time computer users, so it is not
feasible to ask the end user to upgrade. And there is no
Linux-compatible online firmware update for this product range. Hence
a Linux-level workaround for this issue would be highly valuable.

> Is this the only problem blocking s2idle from working effectively on
> this platform?  If so, I would think you want to just do the revert in
> the same series if it's decided this patch needs to spin again for any
> reason.

Yes these could be combined into the same series, with agreement from
the drivers/acpi/ maintainers for the S3 revert.

Thanks
Daniel
Mario Limonciello Jan. 30, 2024, 7:10 p.m. UTC | #3
On 1/30/2024 13:00, Daniel Drake wrote:
> On Tue, Jan 30, 2024 at 2:47 PM Mario Limonciello
> <mario.limonciello@amd.com> wrote:
>> Has there already been a check done whether any newer firmware is
>> available from ASUS that doesn't suffer the deficiency described below?
> 
> The latest firmware does flip StorageD3Enable to 0, which has the side
> effect of never putting the NVMe device or the parent bridge into
> D3cold.
> However, we have shipped hundreds of these devices with the original
> production BIOS version to first time computer users, so it is not
> feasible to ask the end user to upgrade. And there is no
> Linux-compatible online firmware update for this product range. Hence
> a Linux-level workaround for this issue would be highly valuable.

Sure; it makes sense to me why to put a workaround in Linux with your 
above explanation.

Perhaps it make sense to limit the quirk to only the DMI version strings 
of the initial production firmware?

> 
>> Is this the only problem blocking s2idle from working effectively on
>> this platform?  If so, I would think you want to just do the revert in
>> the same series if it's decided this patch needs to spin again for any
>> reason.
> 
> Yes these could be combined into the same series, with agreement from
> the drivers/acpi/ maintainers for the S3 revert.
> 
> Thanks
> Daniel
diff mbox series

Patch

diff --git a/arch/x86/pci/fixup.c b/arch/x86/pci/fixup.c
index f347c20247d30..c486d86bb678b 100644
--- a/arch/x86/pci/fixup.c
+++ b/arch/x86/pci/fixup.c
@@ -907,6 +907,32 @@  static void chromeos_fixup_apl_pci_l1ss_capability(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_save_apl_pci_l1ss_capability);
 DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x5ad6, chromeos_fixup_apl_pci_l1ss_capability);
 
+/*
+ * Disable D3cold on Asus B1400 PCIe bridge at 00:06.0.
+ *
+ * On this platform with VMD off, the NVMe's parent PCI bridge cannot
+ * successfully power back on from D3cold, resulting in unresponsive NVMe on
+ * resume. This appears to be an untested transition by the vendor: Windows
+ * leaves the NVMe and parent bridge in D0 during suspend, even though
+ * StorageD3Enable=1 is set on the production shipped firmware version 304.
+ */
+static const struct dmi_system_id asus_nvme_broken_d3cold_table[] = {
+	{
+		.matches = {
+				DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."),
+				DMI_MATCH(DMI_PRODUCT_NAME, "ASUS EXPERTBOOK B1400CEAE"),
+		},
+	},
+	{}
+};
+
+static void asus_disable_nvme_d3cold(struct pci_dev *pdev)
+{
+	if (dmi_check_system(asus_nvme_broken_d3cold_table) > 0)
+		pci_d3cold_disable(pdev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, 0x9a09, asus_disable_nvme_d3cold);
+
 #ifdef CONFIG_SUSPEND
 /*
  * Root Ports on some AMD SoCs advertise PME_Support for D3hot and D3cold, but