diff mbox series

x86/PCI: Revert: "Clip only host bridge windows for E820 regions"

Message ID 20220612144325.85366-1-hdegoede@redhat.com (mailing list archive)
State Accepted
Commit dd104bcc2cf233234f82bfc4bd5b8ab32cdbf117
Headers show
Series x86/PCI: Revert: "Clip only host bridge windows for E820 regions" | expand

Commit Message

Hans de Goede June 12, 2022, 2:43 p.m. UTC
Clipping the bridge windows directly from pci_acpi_root_prepare_resources()
instead of clipping from arch_remove_reservations(), has a number of
unforseen consequences.

If there is an e820 reservation in the middle of a bridge window, then
the smallest of the 2 remaining parts of the window will be also clipped
off. Where as the previous code would clip regions requested by devices,
rather then the entire window, leaving regions which were either entirely
above or below a reservation in the middle of the window alone.

E.g. on the Steam Deck this leads to this log message:

acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window]

which then gets followed by these log messages:

pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window
pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window

and many more of these. Ultimately this leads to the Steam Deck
no longer booting properly, so revert the change.

Note this is not a clean revert, this revert keeps the later change
to make the clipping dependent on a new pci_use_e820 bool, moving
the checking of this bool to arch_remove_reservations().

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109
Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions")
Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
---
 arch/x86/include/asm/e820/api.h |  5 -----
 arch/x86/include/asm/pci_x86.h  |  8 ++++++++
 arch/x86/kernel/resource.c      | 14 +++++++++-----
 arch/x86/pci/acpi.c             |  8 +-------
 4 files changed, 18 insertions(+), 17 deletions(-)

Comments

Bjorn Helgaas June 13, 2022, 11:15 p.m. UTC | #1
On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote:
> Clipping the bridge windows directly from pci_acpi_root_prepare_resources()
> instead of clipping from arch_remove_reservations(), has a number of
> unforseen consequences.
> 
> If there is an e820 reservation in the middle of a bridge window, then
> the smallest of the 2 remaining parts of the window will be also clipped
> off. Where as the previous code would clip regions requested by devices,
> rather then the entire window, leaving regions which were either entirely
> above or below a reservation in the middle of the window alone.
> 
> E.g. on the Steam Deck this leads to this log message:
> 
> acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window]
> 
> which then gets followed by these log messages:
> 
> pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window
> pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window
> 
> and many more of these. Ultimately this leads to the Steam Deck
> no longer booting properly, so revert the change.
> 
> Note this is not a clean revert, this revert keeps the later change
> to make the clipping dependent on a new pci_use_e820 bool, moving
> the checking of this bool to arch_remove_reservations().

4c5e242d3e93 was definitely a mistake (my fault).  My intent was to
mainly to improve logging of the clipping, but I didn't implement it
well.

That said, I'd like to understand the connection between the messages
you mention and the failure.  There are four bridges whose MMIO
windows were in the [mem 0x80000000-0x9fffffff] area that we clipped
out.  The log shows that we moved all those windows and the devices in
them to the [mem 0xa0100000-0xf7ffffff] area that remained after
clipping.

So I think this *should* have worked even though we moved things
around unnecessarily.  What am I missing?

The E820 map reports [mem 0xa0000000-0xa00fffff] in the middle of the
_CRS, and we currently trim that out.  We think this is a firmware
defect, so it's likely to break in 2023 if we stop clipping by
default.  I'm concerned that there may be other things in _CRS that we
need to avoid, but firmware isn't telling us about them.

Or there's some dependency in the devices that we moved on their
original addresses, e.g., firmware on the device latched the address
and didn't notice the reassignment.

[1] https://bugzilla.kernel.org/attachment.cgi?id=301154

> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109
> Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions")
> Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
> ---
>  arch/x86/include/asm/e820/api.h |  5 -----
>  arch/x86/include/asm/pci_x86.h  |  8 ++++++++
>  arch/x86/kernel/resource.c      | 14 +++++++++-----
>  arch/x86/pci/acpi.c             |  8 +-------
>  4 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
> index 5a39ed59b6db..e8f58ddd06d9 100644
> --- a/arch/x86/include/asm/e820/api.h
> +++ b/arch/x86/include/asm/e820/api.h
> @@ -4,9 +4,6 @@
>  
>  #include <asm/e820/types.h>
>  
> -struct device;
> -struct resource;
> -
>  extern struct e820_table *e820_table;
>  extern struct e820_table *e820_table_kexec;
>  extern struct e820_table *e820_table_firmware;
> @@ -46,8 +43,6 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn);
>  
>  extern int  e820__get_entry_type(u64 start, u64 end);
>  
> -extern void remove_e820_regions(struct device *dev, struct resource *avail);
> -
>  /*
>   * Returns true iff the specified range [start,end) is completely contained inside
>   * the ISA region.
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index f52a886d35cf..70533fdcbf02 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -69,6 +69,8 @@ void pcibios_scan_specific_bus(int busn);
>  
>  /* pci-irq.c */
>  
> +struct pci_dev;
> +
>  struct irq_info {
>  	u8 bus, devfn;			/* Bus, device and function */
>  	struct {
> @@ -246,3 +248,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>  # define x86_default_pci_init_irq	NULL
>  # define x86_default_pci_fixup_irqs	NULL
>  #endif
> +
> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
> +extern bool pci_use_e820;
> +#else
> +#define pci_use_e820 false
> +#endif
> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> index db2b350a37b7..bba1abd05bfe 100644
> --- a/arch/x86/kernel/resource.c
> +++ b/arch/x86/kernel/resource.c
> @@ -1,7 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0
> -#include <linux/dev_printk.h>
>  #include <linux/ioport.h>
> +#include <linux/printk.h>
>  #include <asm/e820/api.h>
> +#include <asm/pci_x86.h>
>  
>  static void resource_clip(struct resource *res, resource_size_t start,
>  			  resource_size_t end)
> @@ -24,14 +25,14 @@ static void resource_clip(struct resource *res, resource_size_t start,
>  		res->start = end + 1;
>  }
>  
> -void remove_e820_regions(struct device *dev, struct resource *avail)
> +static void remove_e820_regions(struct resource *avail)
>  {
>  	int i;
>  	struct e820_entry *entry;
>  	u64 e820_start, e820_end;
>  	struct resource orig = *avail;
>  
> -	if (!(avail->flags & IORESOURCE_MEM))
> +	if (!pci_use_e820)
>  		return;
>  
>  	for (i = 0; i < e820_table->nr_entries; i++) {
> @@ -41,7 +42,7 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
>  
>  		resource_clip(avail, e820_start, e820_end);
>  		if (orig.start != avail->start || orig.end != avail->end) {
> -			dev_info(dev, "clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n",
> +			pr_info("clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n",
>  				 &orig, avail, e820_start, e820_end);
>  			orig = *avail;
>  		}
> @@ -55,6 +56,9 @@ void arch_remove_reservations(struct resource *avail)
>  	 * the low 1MB unconditionally, as this area is needed for some ISA
>  	 * cards requiring a memory range, e.g. the i82365 PCMCIA controller.
>  	 */
> -	if (avail->flags & IORESOURCE_MEM)
> +	if (avail->flags & IORESOURCE_MEM) {
>  		resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END);
> +
> +		remove_e820_regions(avail);
> +	}
>  }
> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> index a4f43054bc79..2f82480fd430 100644
> --- a/arch/x86/pci/acpi.c
> +++ b/arch/x86/pci/acpi.c
> @@ -8,7 +8,6 @@
>  #include <linux/pci-acpi.h>
>  #include <asm/numa.h>
>  #include <asm/pci_x86.h>
> -#include <asm/e820/api.h>
>  
>  struct pci_root_info {
>  	struct acpi_pci_root_info common;
> @@ -20,7 +19,7 @@ struct pci_root_info {
>  #endif
>  };
>  
> -static bool pci_use_e820 = true;
> +bool pci_use_e820 = true;
>  static bool pci_use_crs = true;
>  static bool pci_ignore_seg;
>  
> @@ -387,11 +386,6 @@ static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci)
>  
>  	status = acpi_pci_probe_root_resources(ci);
>  
> -	if (pci_use_e820) {
> -		resource_list_for_each_entry(entry, &ci->resources)
> -			remove_e820_regions(&device->dev, entry->res);
> -	}
> -
>  	if (pci_use_crs) {
>  		resource_list_for_each_entry_safe(entry, tmp, &ci->resources)
>  			if (resource_is_pcicfg_ioport(entry->res))
> -- 
> 2.36.0
>
Hans de Goede June 14, 2022, 8:15 a.m. UTC | #2
Hi,

On 6/14/22 01:15, Bjorn Helgaas wrote:
> On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote:
>> Clipping the bridge windows directly from pci_acpi_root_prepare_resources()
>> instead of clipping from arch_remove_reservations(), has a number of
>> unforseen consequences.
>>
>> If there is an e820 reservation in the middle of a bridge window, then
>> the smallest of the 2 remaining parts of the window will be also clipped
>> off. Where as the previous code would clip regions requested by devices,
>> rather then the entire window, leaving regions which were either entirely
>> above or below a reservation in the middle of the window alone.
>>
>> E.g. on the Steam Deck this leads to this log message:
>>
>> acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window]
>>
>> which then gets followed by these log messages:
>>
>> pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window
>> pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window
>>
>> and many more of these. Ultimately this leads to the Steam Deck
>> no longer booting properly, so revert the change.
>>
>> Note this is not a clean revert, this revert keeps the later change
>> to make the clipping dependent on a new pci_use_e820 bool, moving
>> the checking of this bool to arch_remove_reservations().
> 
> 4c5e242d3e93 was definitely a mistake (my fault).  My intent was to
> mainly to improve logging of the clipping, but I didn't implement it
> well.
> 
> That said, I'd like to understand the connection between the messages
> you mention and the failure.  There are four bridges whose MMIO
> windows were in the [mem 0x80000000-0x9fffffff] area that we clipped
> out.  The log shows that we moved all those windows and the devices in
> them to the [mem 0xa0100000-0xf7ffffff] area that remained after
> clipping.
> 
> So I think this *should* have worked even though we moved things
> around unnecessarily.  What am I missing?

I don't know? My guess is that maybe the ACPI table do MMIO accesses
somewhere to hardcoded addresses and moving things breaks the ACPI
tables.

> 
> The E820 map reports [mem 0xa0000000-0xa00fffff] in the middle of the
> _CRS, and we currently trim that out.  We think this is a firmware
> defect, so it's likely to break in 2023 if we stop clipping by
> default.  I'm concerned that there may be other things in _CRS that we
> need to avoid, but firmware isn't telling us about them.
> 
> Or there's some dependency in the devices that we moved on their
> original addresses, e.g., firmware on the device latched the address
> and didn't notice the reassignment.

Right this is the most likely cause I believe.

Regards,

Hans



> 
> [1] https://bugzilla.kernel.org/attachment.cgi?id=301154
> 
>> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109
>> Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions")
>> Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
>> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
>> ---
>>  arch/x86/include/asm/e820/api.h |  5 -----
>>  arch/x86/include/asm/pci_x86.h  |  8 ++++++++
>>  arch/x86/kernel/resource.c      | 14 +++++++++-----
>>  arch/x86/pci/acpi.c             |  8 +-------
>>  4 files changed, 18 insertions(+), 17 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
>> index 5a39ed59b6db..e8f58ddd06d9 100644
>> --- a/arch/x86/include/asm/e820/api.h
>> +++ b/arch/x86/include/asm/e820/api.h
>> @@ -4,9 +4,6 @@
>>  
>>  #include <asm/e820/types.h>
>>  
>> -struct device;
>> -struct resource;
>> -
>>  extern struct e820_table *e820_table;
>>  extern struct e820_table *e820_table_kexec;
>>  extern struct e820_table *e820_table_firmware;
>> @@ -46,8 +43,6 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn);
>>  
>>  extern int  e820__get_entry_type(u64 start, u64 end);
>>  
>> -extern void remove_e820_regions(struct device *dev, struct resource *avail);
>> -
>>  /*
>>   * Returns true iff the specified range [start,end) is completely contained inside
>>   * the ISA region.
>> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
>> index f52a886d35cf..70533fdcbf02 100644
>> --- a/arch/x86/include/asm/pci_x86.h
>> +++ b/arch/x86/include/asm/pci_x86.h
>> @@ -69,6 +69,8 @@ void pcibios_scan_specific_bus(int busn);
>>  
>>  /* pci-irq.c */
>>  
>> +struct pci_dev;
>> +
>>  struct irq_info {
>>  	u8 bus, devfn;			/* Bus, device and function */
>>  	struct {
>> @@ -246,3 +248,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>>  # define x86_default_pci_init_irq	NULL
>>  # define x86_default_pci_fixup_irqs	NULL
>>  #endif
>> +
>> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
>> +extern bool pci_use_e820;
>> +#else
>> +#define pci_use_e820 false
>> +#endif
>> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
>> index db2b350a37b7..bba1abd05bfe 100644
>> --- a/arch/x86/kernel/resource.c
>> +++ b/arch/x86/kernel/resource.c
>> @@ -1,7 +1,8 @@
>>  // SPDX-License-Identifier: GPL-2.0
>> -#include <linux/dev_printk.h>
>>  #include <linux/ioport.h>
>> +#include <linux/printk.h>
>>  #include <asm/e820/api.h>
>> +#include <asm/pci_x86.h>
>>  
>>  static void resource_clip(struct resource *res, resource_size_t start,
>>  			  resource_size_t end)
>> @@ -24,14 +25,14 @@ static void resource_clip(struct resource *res, resource_size_t start,
>>  		res->start = end + 1;
>>  }
>>  
>> -void remove_e820_regions(struct device *dev, struct resource *avail)
>> +static void remove_e820_regions(struct resource *avail)
>>  {
>>  	int i;
>>  	struct e820_entry *entry;
>>  	u64 e820_start, e820_end;
>>  	struct resource orig = *avail;
>>  
>> -	if (!(avail->flags & IORESOURCE_MEM))
>> +	if (!pci_use_e820)
>>  		return;
>>  
>>  	for (i = 0; i < e820_table->nr_entries; i++) {
>> @@ -41,7 +42,7 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
>>  
>>  		resource_clip(avail, e820_start, e820_end);
>>  		if (orig.start != avail->start || orig.end != avail->end) {
>> -			dev_info(dev, "clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n",
>> +			pr_info("clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n",
>>  				 &orig, avail, e820_start, e820_end);
>>  			orig = *avail;
>>  		}
>> @@ -55,6 +56,9 @@ void arch_remove_reservations(struct resource *avail)
>>  	 * the low 1MB unconditionally, as this area is needed for some ISA
>>  	 * cards requiring a memory range, e.g. the i82365 PCMCIA controller.
>>  	 */
>> -	if (avail->flags & IORESOURCE_MEM)
>> +	if (avail->flags & IORESOURCE_MEM) {
>>  		resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END);
>> +
>> +		remove_e820_regions(avail);
>> +	}
>>  }
>> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
>> index a4f43054bc79..2f82480fd430 100644
>> --- a/arch/x86/pci/acpi.c
>> +++ b/arch/x86/pci/acpi.c
>> @@ -8,7 +8,6 @@
>>  #include <linux/pci-acpi.h>
>>  #include <asm/numa.h>
>>  #include <asm/pci_x86.h>
>> -#include <asm/e820/api.h>
>>  
>>  struct pci_root_info {
>>  	struct acpi_pci_root_info common;
>> @@ -20,7 +19,7 @@ struct pci_root_info {
>>  #endif
>>  };
>>  
>> -static bool pci_use_e820 = true;
>> +bool pci_use_e820 = true;
>>  static bool pci_use_crs = true;
>>  static bool pci_ignore_seg;
>>  
>> @@ -387,11 +386,6 @@ static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci)
>>  
>>  	status = acpi_pci_probe_root_resources(ci);
>>  
>> -	if (pci_use_e820) {
>> -		resource_list_for_each_entry(entry, &ci->resources)
>> -			remove_e820_regions(&device->dev, entry->res);
>> -	}
>> -
>>  	if (pci_use_crs) {
>>  		resource_list_for_each_entry_safe(entry, tmp, &ci->resources)
>>  			if (resource_is_pcicfg_ioport(entry->res))
>> -- 
>> 2.36.0
>>
>
Andy Shevchenko June 14, 2022, 12:28 p.m. UTC | #3
On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote:
> Clipping the bridge windows directly from pci_acpi_root_prepare_resources()
> instead of clipping from arch_remove_reservations(), has a number of
> unforseen consequences.
> 
> If there is an e820 reservation in the middle of a bridge window, then
> the smallest of the 2 remaining parts of the window will be also clipped
> off. Where as the previous code would clip regions requested by devices,
> rather then the entire window, leaving regions which were either entirely
> above or below a reservation in the middle of the window alone.
> 
> E.g. on the Steam Deck this leads to this log message:
> 
> acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window]
> 
> which then gets followed by these log messages:
> 
> pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window
> pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window
> 
> and many more of these. Ultimately this leads to the Steam Deck
> no longer booting properly, so revert the change.
> 
> Note this is not a clean revert, this revert keeps the later change
> to make the clipping dependent on a new pci_use_e820 bool, moving
> the checking of this bool to arch_remove_reservations().

It does _not_ fix the Intel MID case. It requires to have my patch applied as well.
So the difference as I see is the flags checking. I believe that you still need to
have it in case pci_use_e820 == true. But it might be that I missed an importan
detail.
Andy Shevchenko June 14, 2022, 1:58 p.m. UTC | #4
On Tue, Jun 14, 2022 at 03:28:36PM +0300, Andy Shevchenko wrote:
> On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote:
> > Clipping the bridge windows directly from pci_acpi_root_prepare_resources()
> > instead of clipping from arch_remove_reservations(), has a number of
> > unforseen consequences.
> > 
> > If there is an e820 reservation in the middle of a bridge window, then
> > the smallest of the 2 remaining parts of the window will be also clipped
> > off. Where as the previous code would clip regions requested by devices,
> > rather then the entire window, leaving regions which were either entirely
> > above or below a reservation in the middle of the window alone.
> > 
> > E.g. on the Steam Deck this leads to this log message:
> > 
> > acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window]
> > 
> > which then gets followed by these log messages:
> > 
> > pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window
> > pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window
> > 
> > and many more of these. Ultimately this leads to the Steam Deck
> > no longer booting properly, so revert the change.
> > 
> > Note this is not a clean revert, this revert keeps the later change
> > to make the clipping dependent on a new pci_use_e820 bool, moving
> > the checking of this bool to arch_remove_reservations().
> 
> It does _not_ fix the Intel MID case. It requires to have my patch applied as well.
> So the difference as I see is the flags checking. I believe that you still need to
> have it in case pci_use_e820 == true. But it might be that I missed an importan
> detail.

Yeah, I missed that I have compiled a tested something else, and not what I was
focused on.

That said, I double checked with your patch and nothing else and v5.19-rc2
based tree works nicely.

Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Bjorn Helgaas June 14, 2022, 3:17 p.m. UTC | #5
On Tue, Jun 14, 2022 at 10:15:29AM +0200, Hans de Goede wrote:
> On 6/14/22 01:15, Bjorn Helgaas wrote:
> > On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote:
> >> Clipping the bridge windows directly from pci_acpi_root_prepare_resources()
> >> instead of clipping from arch_remove_reservations(), has a number of
> >> unforseen consequences.
> >>
> >> If there is an e820 reservation in the middle of a bridge window, then
> >> the smallest of the 2 remaining parts of the window will be also clipped
> >> off. Where as the previous code would clip regions requested by devices,
> >> rather then the entire window, leaving regions which were either entirely
> >> above or below a reservation in the middle of the window alone.
> >>
> >> E.g. on the Steam Deck this leads to this log message:
> >>
> >> acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window]
> >>
> >> which then gets followed by these log messages:
> >>
> >> pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window
> >> pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window
> >>
> >> and many more of these. Ultimately this leads to the Steam Deck
> >> no longer booting properly, so revert the change.
> >>
> >> Note this is not a clean revert, this revert keeps the later change
> >> to make the clipping dependent on a new pci_use_e820 bool, moving
> >> the checking of this bool to arch_remove_reservations().
> > 
> > 4c5e242d3e93 was definitely a mistake (my fault).  My intent was to
> > mainly to improve logging of the clipping, but I didn't implement it
> > well.
> > 
> > That said, I'd like to understand the connection between the messages
> > you mention and the failure.  There are four bridges whose MMIO
> > windows were in the [mem 0x80000000-0x9fffffff] area that we clipped
> > out.  The log shows that we moved all those windows and the devices in
> > them to the [mem 0xa0100000-0xf7ffffff] area that remained after
> > clipping.
> > 
> > So I think this *should* have worked even though we moved things
> > around unnecessarily.  What am I missing?
> 
> I don't know? My guess is that maybe the ACPI table do MMIO accesses
> somewhere to hardcoded addresses and moving things breaks the ACPI
> tables.

This would be a firmware defect, IMHO.  There is a mechanism (_DSM for
preserving PCI Boot Configurations) for the firmware to tell us about
things that can't be moved.

> > The E820 map reports [mem 0xa0000000-0xa00fffff] in the middle of the
> > _CRS, and we currently trim that out.  We think this is a firmware
> > defect, so it's likely to break in 2023 if we stop clipping by
> > default.  I'm concerned that there may be other things in _CRS that we
> > need to avoid, but firmware isn't telling us about them.
> > 
> > Or there's some dependency in the devices that we moved on their
> > original addresses, e.g., firmware on the device latched the address
> > and didn't notice the reassignment.
> 
> Right this is the most likely cause I believe.

This would be another defect, in the device this time.  If we can
identify the device, possibly we could quirk around it.

Either one will be back to bite us in the future if we support
rebalancing resources to make room for hot-added devices.  I *think*
Windows already supports this kind of rebalancing.  Anyway, this is
why we need to dig a little deeper to figure out exactly what's going
wrong here.

Bjorn
Hans de Goede June 14, 2022, 3:47 p.m. UTC | #6
Hi,

On 6/14/22 17:17, Bjorn Helgaas wrote:
> On Tue, Jun 14, 2022 at 10:15:29AM +0200, Hans de Goede wrote:
>> On 6/14/22 01:15, Bjorn Helgaas wrote:
>>> On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote:
>>>> Clipping the bridge windows directly from pci_acpi_root_prepare_resources()
>>>> instead of clipping from arch_remove_reservations(), has a number of
>>>> unforseen consequences.
>>>>
>>>> If there is an e820 reservation in the middle of a bridge window, then
>>>> the smallest of the 2 remaining parts of the window will be also clipped
>>>> off. Where as the previous code would clip regions requested by devices,
>>>> rather then the entire window, leaving regions which were either entirely
>>>> above or below a reservation in the middle of the window alone.
>>>>
>>>> E.g. on the Steam Deck this leads to this log message:
>>>>
>>>> acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window]
>>>>
>>>> which then gets followed by these log messages:
>>>>
>>>> pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window
>>>> pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window
>>>>
>>>> and many more of these. Ultimately this leads to the Steam Deck
>>>> no longer booting properly, so revert the change.
>>>>
>>>> Note this is not a clean revert, this revert keeps the later change
>>>> to make the clipping dependent on a new pci_use_e820 bool, moving
>>>> the checking of this bool to arch_remove_reservations().
>>>
>>> 4c5e242d3e93 was definitely a mistake (my fault).  My intent was to
>>> mainly to improve logging of the clipping, but I didn't implement it
>>> well.
>>>
>>> That said, I'd like to understand the connection between the messages
>>> you mention and the failure.  There are four bridges whose MMIO
>>> windows were in the [mem 0x80000000-0x9fffffff] area that we clipped
>>> out.  The log shows that we moved all those windows and the devices in
>>> them to the [mem 0xa0100000-0xf7ffffff] area that remained after
>>> clipping.
>>>
>>> So I think this *should* have worked even though we moved things
>>> around unnecessarily.  What am I missing?
>>
>> I don't know? My guess is that maybe the ACPI table do MMIO accesses
>> somewhere to hardcoded addresses and moving things breaks the ACPI
>> tables.
> 
> This would be a firmware defect, IMHO.  There is a mechanism (_DSM for
> preserving PCI Boot Configurations) for the firmware to tell us about
> things that can't be moved.
> 
>>> The E820 map reports [mem 0xa0000000-0xa00fffff] in the middle of the
>>> _CRS, and we currently trim that out.  We think this is a firmware
>>> defect, so it's likely to break in 2023 if we stop clipping by
>>> default.  I'm concerned that there may be other things in _CRS that we
>>> need to avoid, but firmware isn't telling us about them.
>>>
>>> Or there's some dependency in the devices that we moved on their
>>> original addresses, e.g., firmware on the device latched the address
>>> and didn't notice the reassignment.
>>
>> Right this is the most likely cause I believe.
> 
> This would be another defect, in the device this time.  If we can
> identify the device, possibly we could quirk around it.
> 
> Either one will be back to bite us in the future if we support
> rebalancing resources to make room for hot-added devices.  I *think*
> Windows already supports this kind of rebalancing.  Anyway, this is
> why we need to dig a little deeper to figure out exactly what's going
> wrong here.

Have you looked at the log of the failed boot in the Steam Deck kernel
bugzilla? Everything there seems to work just fine and then the system
just hangs. I think that maybe it cannot find its root disk, so maybe
an NVME issue ?

I'm afraid that if we cannot figure out the exact root cause from
the failed boot log there is not much more we can do. Unless you
have an idea how to debug this further and want to ask the reporter
for more info?

Regards,

Hans
Guilherme G. Piccoli June 14, 2022, 10:49 p.m. UTC | #7
On 14/06/2022 12:47, Hans de Goede wrote:
> [...]
> 
> Have you looked at the log of the failed boot in the Steam Deck kernel
> bugzilla? Everything there seems to work just fine and then the system
> just hangs. I think that maybe it cannot find its root disk, so maybe
> an NVME issue ?
> 

*Exactly* that - NVMe device is the root disk, it cannot boot since the
device doesn't work, hence no rootfs =)

Patch looks great from my side, thanks !
cheers,


Guilherme
Bjorn Helgaas June 14, 2022, 11:01 p.m. UTC | #8
[+cc NVMe folks]

On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote:
> On 14/06/2022 12:47, Hans de Goede wrote:
> > [...]
> > 
> > Have you looked at the log of the failed boot in the Steam Deck kernel
> > bugzilla? Everything there seems to work just fine and then the system
> > just hangs. I think that maybe it cannot find its root disk, so maybe
> > an NVME issue ?
> 
> *Exactly* that - NVMe device is the root disk, it cannot boot since the
> device doesn't work, hence no rootfs =)

Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com

Steam Deck broke because we erroneously trimmed out the PCI host
bridge window where BIOS had placed most devices, successfully
reassigned all the PCI bridge windows and BARs, but some devices,
apparently including NVMe, didn't work at the new addresses.

Do you NVMe folks know of gotchas in this area?  I want to know
because we'd like to be able to move devices around someday to make
room for hot-added devices.

This reassignment happened before drivers claimed the devices, so from
a PCI point of view, I don't know why the NVMe device wouldn't work at
the new address.

Bjorn
Keith Busch June 14, 2022, 11:47 p.m. UTC | #9
On Tue, Jun 14, 2022 at 06:01:28PM -0500, Bjorn Helgaas wrote:
> [+cc NVMe folks]
> 
> On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote:
> > On 14/06/2022 12:47, Hans de Goede wrote:
> > > [...]
> > > 
> > > Have you looked at the log of the failed boot in the Steam Deck kernel
> > > bugzilla? Everything there seems to work just fine and then the system
> > > just hangs. I think that maybe it cannot find its root disk, so maybe
> > > an NVME issue ?
> > 
> > *Exactly* that - NVMe device is the root disk, it cannot boot since the
> > device doesn't work, hence no rootfs =)
> 
> Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
> 
> Steam Deck broke because we erroneously trimmed out the PCI host
> bridge window where BIOS had placed most devices, successfully
> reassigned all the PCI bridge windows and BARs, but some devices,
> apparently including NVMe, didn't work at the new addresses.
> 
> Do you NVMe folks know of gotchas in this area?  I want to know
> because we'd like to be able to move devices around someday to make
> room for hot-added devices.
> 
> This reassignment happened before drivers claimed the devices, so from
> a PCI point of view, I don't know why the NVMe device wouldn't work at
> the new address.

The probe status quickly returns ENODEV. Based on the output (we don't log
much, so this is just an educated guesss), I think that means the driver read
all F's from the status register, which indicates we can't read it when using
the reassigned memory window.

Why changing memory windows may not work tends to be platform or device
specific. Considering the renumbered windows didn't cause a problem for other
devices, it sounds like this nvme device may be broken.
Bjorn Helgaas June 15, 2022, 3:11 p.m. UTC | #10
On Tue, Jun 14, 2022 at 04:47:35PM -0700, Keith Busch wrote:
> On Tue, Jun 14, 2022 at 06:01:28PM -0500, Bjorn Helgaas wrote:
> > [+cc NVMe folks]
> > 
> > On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote:
> > > On 14/06/2022 12:47, Hans de Goede wrote:
> > > > [...]
> > > > 
> > > > Have you looked at the log of the failed boot in the Steam Deck kernel
> > > > bugzilla? Everything there seems to work just fine and then the system
> > > > just hangs. I think that maybe it cannot find its root disk, so maybe
> > > > an NVME issue ?
> > > 
> > > *Exactly* that - NVMe device is the root disk, it cannot boot since the
> > > device doesn't work, hence no rootfs =)
> > 
> > Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
> > 
> > Steam Deck broke because we erroneously trimmed out the PCI host
> > bridge window where BIOS had placed most devices, successfully
> > reassigned all the PCI bridge windows and BARs, but some devices,
> > apparently including NVMe, didn't work at the new addresses.
> > 
> > Do you NVMe folks know of gotchas in this area?  I want to know
> > because we'd like to be able to move devices around someday to
> > make room for hot-added devices.
> > 
> > This reassignment happened before drivers claimed the devices, so
> > from a PCI point of view, I don't know why the NVMe device
> > wouldn't work at the new address.
> 
> The probe status quickly returns ENODEV. Based on the output (we
> don't log much, so this is just an educated guesss), I think that
> means the driver read all F's from the status register, which
> indicates we can't read it when using the reassigned memory window.
> 
> Why changing memory windows may not work tends to be platform or
> device specific. Considering the renumbered windows didn't cause a
> problem for other devices, it sounds like this nvme device may be
> broken.

It sounds like you've seen this sort of problem before, so we
shouldn't assume that it's safe to reassign BARs.

I think Windows supports rebalancing, but it does look like drivers
have the ability to veto it:

  https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/stopping-a-device-to-rebalance-resources
  https://docs.microsoft.com/en-us/windows-hardware/drivers/wdf/the-pnp-manager-redistributes-system-resources

So I suppose if/when we support rebalancing, it'll have to be an
opt-in thing for each driver.
Bjorn Helgaas June 17, 2022, 7:55 p.m. UTC | #11
[+cc Andy, Benjamin, Keith]

On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote:
> Clipping the bridge windows directly from pci_acpi_root_prepare_resources()
> instead of clipping from arch_remove_reservations(), has a number of
> unforseen consequences.
> 
> If there is an e820 reservation in the middle of a bridge window, then
> the smallest of the 2 remaining parts of the window will be also clipped
> off. Where as the previous code would clip regions requested by devices,
> rather then the entire window, leaving regions which were either entirely
> above or below a reservation in the middle of the window alone.
> 
> E.g. on the Steam Deck this leads to this log message:
> 
> acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window]
> 
> which then gets followed by these log messages:
> 
> pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window
> pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window
> 
> and many more of these. Ultimately this leads to the Steam Deck
> no longer booting properly, so revert the change.
> 
> Note this is not a clean revert, this revert keeps the later change
> to make the clipping dependent on a new pci_use_e820 bool, moving
> the checking of this bool to arch_remove_reservations().
> 
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109
> Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions")
> Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
> Signed-off-by: Hans de Goede <hdegoede@redhat.com>

Applied to for-linus for v5.19.  Thanks for cleaning up my mess, Hans.

> ---
>  arch/x86/include/asm/e820/api.h |  5 -----
>  arch/x86/include/asm/pci_x86.h  |  8 ++++++++
>  arch/x86/kernel/resource.c      | 14 +++++++++-----
>  arch/x86/pci/acpi.c             |  8 +-------
>  4 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
> index 5a39ed59b6db..e8f58ddd06d9 100644
> --- a/arch/x86/include/asm/e820/api.h
> +++ b/arch/x86/include/asm/e820/api.h
> @@ -4,9 +4,6 @@
>  
>  #include <asm/e820/types.h>
>  
> -struct device;
> -struct resource;
> -
>  extern struct e820_table *e820_table;
>  extern struct e820_table *e820_table_kexec;
>  extern struct e820_table *e820_table_firmware;
> @@ -46,8 +43,6 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn);
>  
>  extern int  e820__get_entry_type(u64 start, u64 end);
>  
> -extern void remove_e820_regions(struct device *dev, struct resource *avail);
> -
>  /*
>   * Returns true iff the specified range [start,end) is completely contained inside
>   * the ISA region.
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index f52a886d35cf..70533fdcbf02 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -69,6 +69,8 @@ void pcibios_scan_specific_bus(int busn);
>  
>  /* pci-irq.c */
>  
> +struct pci_dev;
> +
>  struct irq_info {
>  	u8 bus, devfn;			/* Bus, device and function */
>  	struct {
> @@ -246,3 +248,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val)
>  # define x86_default_pci_init_irq	NULL
>  # define x86_default_pci_fixup_irqs	NULL
>  #endif
> +
> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
> +extern bool pci_use_e820;
> +#else
> +#define pci_use_e820 false
> +#endif
> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
> index db2b350a37b7..bba1abd05bfe 100644
> --- a/arch/x86/kernel/resource.c
> +++ b/arch/x86/kernel/resource.c
> @@ -1,7 +1,8 @@
>  // SPDX-License-Identifier: GPL-2.0
> -#include <linux/dev_printk.h>
>  #include <linux/ioport.h>
> +#include <linux/printk.h>
>  #include <asm/e820/api.h>
> +#include <asm/pci_x86.h>
>  
>  static void resource_clip(struct resource *res, resource_size_t start,
>  			  resource_size_t end)
> @@ -24,14 +25,14 @@ static void resource_clip(struct resource *res, resource_size_t start,
>  		res->start = end + 1;
>  }
>  
> -void remove_e820_regions(struct device *dev, struct resource *avail)
> +static void remove_e820_regions(struct resource *avail)
>  {
>  	int i;
>  	struct e820_entry *entry;
>  	u64 e820_start, e820_end;
>  	struct resource orig = *avail;
>  
> -	if (!(avail->flags & IORESOURCE_MEM))
> +	if (!pci_use_e820)
>  		return;
>  
>  	for (i = 0; i < e820_table->nr_entries; i++) {
> @@ -41,7 +42,7 @@ void remove_e820_regions(struct device *dev, struct resource *avail)
>  
>  		resource_clip(avail, e820_start, e820_end);
>  		if (orig.start != avail->start || orig.end != avail->end) {
> -			dev_info(dev, "clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n",
> +			pr_info("clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n",
>  				 &orig, avail, e820_start, e820_end);
>  			orig = *avail;
>  		}
> @@ -55,6 +56,9 @@ void arch_remove_reservations(struct resource *avail)
>  	 * the low 1MB unconditionally, as this area is needed for some ISA
>  	 * cards requiring a memory range, e.g. the i82365 PCMCIA controller.
>  	 */
> -	if (avail->flags & IORESOURCE_MEM)
> +	if (avail->flags & IORESOURCE_MEM) {
>  		resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END);
> +
> +		remove_e820_regions(avail);
> +	}
>  }
> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
> index a4f43054bc79..2f82480fd430 100644
> --- a/arch/x86/pci/acpi.c
> +++ b/arch/x86/pci/acpi.c
> @@ -8,7 +8,6 @@
>  #include <linux/pci-acpi.h>
>  #include <asm/numa.h>
>  #include <asm/pci_x86.h>
> -#include <asm/e820/api.h>
>  
>  struct pci_root_info {
>  	struct acpi_pci_root_info common;
> @@ -20,7 +19,7 @@ struct pci_root_info {
>  #endif
>  };
>  
> -static bool pci_use_e820 = true;
> +bool pci_use_e820 = true;
>  static bool pci_use_crs = true;
>  static bool pci_ignore_seg;
>  
> @@ -387,11 +386,6 @@ static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci)
>  
>  	status = acpi_pci_probe_root_resources(ci);
>  
> -	if (pci_use_e820) {
> -		resource_list_for_each_entry(entry, &ci->resources)
> -			remove_e820_regions(&device->dev, entry->res);
> -	}
> -
>  	if (pci_use_crs) {
>  		resource_list_for_each_entry_safe(entry, tmp, &ci->resources)
>  			if (resource_is_pcicfg_ioport(entry->res))
> -- 
> 2.36.0
>
Keith Busch June 17, 2022, 8:27 p.m. UTC | #12
On Wed, Jun 15, 2022 at 10:11:00AM -0500, Bjorn Helgaas wrote:
> On Tue, Jun 14, 2022 at 04:47:35PM -0700, Keith Busch wrote:
> > On Tue, Jun 14, 2022 at 06:01:28PM -0500, Bjorn Helgaas wrote:
> > > [+cc NVMe folks]
> > > 
> > > On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote:
> > > > On 14/06/2022 12:47, Hans de Goede wrote:
> > > > > [...]
> > > > > 
> > > > > Have you looked at the log of the failed boot in the Steam Deck kernel
> > > > > bugzilla? Everything there seems to work just fine and then the system
> > > > > just hangs. I think that maybe it cannot find its root disk, so maybe
> > > > > an NVME issue ?
> > > > 
> > > > *Exactly* that - NVMe device is the root disk, it cannot boot since the
> > > > device doesn't work, hence no rootfs =)
> > > 
> > > Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com
> > > 
> > > Steam Deck broke because we erroneously trimmed out the PCI host
> > > bridge window where BIOS had placed most devices, successfully
> > > reassigned all the PCI bridge windows and BARs, but some devices,
> > > apparently including NVMe, didn't work at the new addresses.
> > > 
> > > Do you NVMe folks know of gotchas in this area?  I want to know
> > > because we'd like to be able to move devices around someday to
> > > make room for hot-added devices.
> > > 
> > > This reassignment happened before drivers claimed the devices, so
> > > from a PCI point of view, I don't know why the NVMe device
> > > wouldn't work at the new address.
> > 
> > The probe status quickly returns ENODEV. Based on the output (we
> > don't log much, so this is just an educated guesss), I think that
> > means the driver read all F's from the status register, which
> > indicates we can't read it when using the reassigned memory window.
> > 
> > Why changing memory windows may not work tends to be platform or
> > device specific. Considering the renumbered windows didn't cause a
> > problem for other devices, it sounds like this nvme device may be
> > broken.
> 
> It sounds like you've seen this sort of problem before, so we
> shouldn't assume that it's safe to reassign BARs.

I haven't seen this type of problem in years, but as I recall, it was always
low-end consumer crap that couldn't deal with changing BARs; you're stuck with
whatever was set after it was initially powered on. The PCI topology will
reflect the expected renumbering, but whatever is happening on the other side
of the PCI function seems to be unaware of the change.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h
index 5a39ed59b6db..e8f58ddd06d9 100644
--- a/arch/x86/include/asm/e820/api.h
+++ b/arch/x86/include/asm/e820/api.h
@@ -4,9 +4,6 @@ 
 
 #include <asm/e820/types.h>
 
-struct device;
-struct resource;
-
 extern struct e820_table *e820_table;
 extern struct e820_table *e820_table_kexec;
 extern struct e820_table *e820_table_firmware;
@@ -46,8 +43,6 @@  extern void e820__register_nosave_regions(unsigned long limit_pfn);
 
 extern int  e820__get_entry_type(u64 start, u64 end);
 
-extern void remove_e820_regions(struct device *dev, struct resource *avail);
-
 /*
  * Returns true iff the specified range [start,end) is completely contained inside
  * the ISA region.
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
index f52a886d35cf..70533fdcbf02 100644
--- a/arch/x86/include/asm/pci_x86.h
+++ b/arch/x86/include/asm/pci_x86.h
@@ -69,6 +69,8 @@  void pcibios_scan_specific_bus(int busn);
 
 /* pci-irq.c */
 
+struct pci_dev;
+
 struct irq_info {
 	u8 bus, devfn;			/* Bus, device and function */
 	struct {
@@ -246,3 +248,9 @@  static inline void mmio_config_writel(void __iomem *pos, u32 val)
 # define x86_default_pci_init_irq	NULL
 # define x86_default_pci_fixup_irqs	NULL
 #endif
+
+#if defined(CONFIG_PCI) && defined(CONFIG_ACPI)
+extern bool pci_use_e820;
+#else
+#define pci_use_e820 false
+#endif
diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c
index db2b350a37b7..bba1abd05bfe 100644
--- a/arch/x86/kernel/resource.c
+++ b/arch/x86/kernel/resource.c
@@ -1,7 +1,8 @@ 
 // SPDX-License-Identifier: GPL-2.0
-#include <linux/dev_printk.h>
 #include <linux/ioport.h>
+#include <linux/printk.h>
 #include <asm/e820/api.h>
+#include <asm/pci_x86.h>
 
 static void resource_clip(struct resource *res, resource_size_t start,
 			  resource_size_t end)
@@ -24,14 +25,14 @@  static void resource_clip(struct resource *res, resource_size_t start,
 		res->start = end + 1;
 }
 
-void remove_e820_regions(struct device *dev, struct resource *avail)
+static void remove_e820_regions(struct resource *avail)
 {
 	int i;
 	struct e820_entry *entry;
 	u64 e820_start, e820_end;
 	struct resource orig = *avail;
 
-	if (!(avail->flags & IORESOURCE_MEM))
+	if (!pci_use_e820)
 		return;
 
 	for (i = 0; i < e820_table->nr_entries; i++) {
@@ -41,7 +42,7 @@  void remove_e820_regions(struct device *dev, struct resource *avail)
 
 		resource_clip(avail, e820_start, e820_end);
 		if (orig.start != avail->start || orig.end != avail->end) {
-			dev_info(dev, "clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n",
+			pr_info("clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n",
 				 &orig, avail, e820_start, e820_end);
 			orig = *avail;
 		}
@@ -55,6 +56,9 @@  void arch_remove_reservations(struct resource *avail)
 	 * the low 1MB unconditionally, as this area is needed for some ISA
 	 * cards requiring a memory range, e.g. the i82365 PCMCIA controller.
 	 */
-	if (avail->flags & IORESOURCE_MEM)
+	if (avail->flags & IORESOURCE_MEM) {
 		resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END);
+
+		remove_e820_regions(avail);
+	}
 }
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c
index a4f43054bc79..2f82480fd430 100644
--- a/arch/x86/pci/acpi.c
+++ b/arch/x86/pci/acpi.c
@@ -8,7 +8,6 @@ 
 #include <linux/pci-acpi.h>
 #include <asm/numa.h>
 #include <asm/pci_x86.h>
-#include <asm/e820/api.h>
 
 struct pci_root_info {
 	struct acpi_pci_root_info common;
@@ -20,7 +19,7 @@  struct pci_root_info {
 #endif
 };
 
-static bool pci_use_e820 = true;
+bool pci_use_e820 = true;
 static bool pci_use_crs = true;
 static bool pci_ignore_seg;
 
@@ -387,11 +386,6 @@  static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci)
 
 	status = acpi_pci_probe_root_resources(ci);
 
-	if (pci_use_e820) {
-		resource_list_for_each_entry(entry, &ci->resources)
-			remove_e820_regions(&device->dev, entry->res);
-	}
-
 	if (pci_use_crs) {
 		resource_list_for_each_entry_safe(entry, tmp, &ci->resources)
 			if (resource_is_pcicfg_ioport(entry->res))