Message ID | 20220612144325.85366-1-hdegoede@redhat.com (mailing list archive) |
---|---|
State | Accepted |
Commit | dd104bcc2cf233234f82bfc4bd5b8ab32cdbf117 |
Headers | show |
Series | x86/PCI: Revert: "Clip only host bridge windows for E820 regions" | expand |
On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote: > Clipping the bridge windows directly from pci_acpi_root_prepare_resources() > instead of clipping from arch_remove_reservations(), has a number of > unforseen consequences. > > If there is an e820 reservation in the middle of a bridge window, then > the smallest of the 2 remaining parts of the window will be also clipped > off. Where as the previous code would clip regions requested by devices, > rather then the entire window, leaving regions which were either entirely > above or below a reservation in the middle of the window alone. > > E.g. on the Steam Deck this leads to this log message: > > acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] > > which then gets followed by these log messages: > > pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window > pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window > > and many more of these. Ultimately this leads to the Steam Deck > no longer booting properly, so revert the change. > > Note this is not a clean revert, this revert keeps the later change > to make the clipping dependent on a new pci_use_e820 bool, moving > the checking of this bool to arch_remove_reservations(). 4c5e242d3e93 was definitely a mistake (my fault). My intent was to mainly to improve logging of the clipping, but I didn't implement it well. That said, I'd like to understand the connection between the messages you mention and the failure. There are four bridges whose MMIO windows were in the [mem 0x80000000-0x9fffffff] area that we clipped out. The log shows that we moved all those windows and the devices in them to the [mem 0xa0100000-0xf7ffffff] area that remained after clipping. So I think this *should* have worked even though we moved things around unnecessarily. What am I missing? The E820 map reports [mem 0xa0000000-0xa00fffff] in the middle of the _CRS, and we currently trim that out. We think this is a firmware defect, so it's likely to break in 2023 if we stop clipping by default. I'm concerned that there may be other things in _CRS that we need to avoid, but firmware isn't telling us about them. Or there's some dependency in the devices that we moved on their original addresses, e.g., firmware on the device latched the address and didn't notice the reassignment. [1] https://bugzilla.kernel.org/attachment.cgi?id=301154 > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109 > Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions") > Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > Signed-off-by: Hans de Goede <hdegoede@redhat.com> > --- > arch/x86/include/asm/e820/api.h | 5 ----- > arch/x86/include/asm/pci_x86.h | 8 ++++++++ > arch/x86/kernel/resource.c | 14 +++++++++----- > arch/x86/pci/acpi.c | 8 +------- > 4 files changed, 18 insertions(+), 17 deletions(-) > > diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h > index 5a39ed59b6db..e8f58ddd06d9 100644 > --- a/arch/x86/include/asm/e820/api.h > +++ b/arch/x86/include/asm/e820/api.h > @@ -4,9 +4,6 @@ > > #include <asm/e820/types.h> > > -struct device; > -struct resource; > - > extern struct e820_table *e820_table; > extern struct e820_table *e820_table_kexec; > extern struct e820_table *e820_table_firmware; > @@ -46,8 +43,6 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn); > > extern int e820__get_entry_type(u64 start, u64 end); > > -extern void remove_e820_regions(struct device *dev, struct resource *avail); > - > /* > * Returns true iff the specified range [start,end) is completely contained inside > * the ISA region. > diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h > index f52a886d35cf..70533fdcbf02 100644 > --- a/arch/x86/include/asm/pci_x86.h > +++ b/arch/x86/include/asm/pci_x86.h > @@ -69,6 +69,8 @@ void pcibios_scan_specific_bus(int busn); > > /* pci-irq.c */ > > +struct pci_dev; > + > struct irq_info { > u8 bus, devfn; /* Bus, device and function */ > struct { > @@ -246,3 +248,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val) > # define x86_default_pci_init_irq NULL > # define x86_default_pci_fixup_irqs NULL > #endif > + > +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI) > +extern bool pci_use_e820; > +#else > +#define pci_use_e820 false > +#endif > diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c > index db2b350a37b7..bba1abd05bfe 100644 > --- a/arch/x86/kernel/resource.c > +++ b/arch/x86/kernel/resource.c > @@ -1,7 +1,8 @@ > // SPDX-License-Identifier: GPL-2.0 > -#include <linux/dev_printk.h> > #include <linux/ioport.h> > +#include <linux/printk.h> > #include <asm/e820/api.h> > +#include <asm/pci_x86.h> > > static void resource_clip(struct resource *res, resource_size_t start, > resource_size_t end) > @@ -24,14 +25,14 @@ static void resource_clip(struct resource *res, resource_size_t start, > res->start = end + 1; > } > > -void remove_e820_regions(struct device *dev, struct resource *avail) > +static void remove_e820_regions(struct resource *avail) > { > int i; > struct e820_entry *entry; > u64 e820_start, e820_end; > struct resource orig = *avail; > > - if (!(avail->flags & IORESOURCE_MEM)) > + if (!pci_use_e820) > return; > > for (i = 0; i < e820_table->nr_entries; i++) { > @@ -41,7 +42,7 @@ void remove_e820_regions(struct device *dev, struct resource *avail) > > resource_clip(avail, e820_start, e820_end); > if (orig.start != avail->start || orig.end != avail->end) { > - dev_info(dev, "clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n", > + pr_info("clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n", > &orig, avail, e820_start, e820_end); > orig = *avail; > } > @@ -55,6 +56,9 @@ void arch_remove_reservations(struct resource *avail) > * the low 1MB unconditionally, as this area is needed for some ISA > * cards requiring a memory range, e.g. the i82365 PCMCIA controller. > */ > - if (avail->flags & IORESOURCE_MEM) > + if (avail->flags & IORESOURCE_MEM) { > resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END); > + > + remove_e820_regions(avail); > + } > } > diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c > index a4f43054bc79..2f82480fd430 100644 > --- a/arch/x86/pci/acpi.c > +++ b/arch/x86/pci/acpi.c > @@ -8,7 +8,6 @@ > #include <linux/pci-acpi.h> > #include <asm/numa.h> > #include <asm/pci_x86.h> > -#include <asm/e820/api.h> > > struct pci_root_info { > struct acpi_pci_root_info common; > @@ -20,7 +19,7 @@ struct pci_root_info { > #endif > }; > > -static bool pci_use_e820 = true; > +bool pci_use_e820 = true; > static bool pci_use_crs = true; > static bool pci_ignore_seg; > > @@ -387,11 +386,6 @@ static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci) > > status = acpi_pci_probe_root_resources(ci); > > - if (pci_use_e820) { > - resource_list_for_each_entry(entry, &ci->resources) > - remove_e820_regions(&device->dev, entry->res); > - } > - > if (pci_use_crs) { > resource_list_for_each_entry_safe(entry, tmp, &ci->resources) > if (resource_is_pcicfg_ioport(entry->res)) > -- > 2.36.0 >
Hi, On 6/14/22 01:15, Bjorn Helgaas wrote: > On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote: >> Clipping the bridge windows directly from pci_acpi_root_prepare_resources() >> instead of clipping from arch_remove_reservations(), has a number of >> unforseen consequences. >> >> If there is an e820 reservation in the middle of a bridge window, then >> the smallest of the 2 remaining parts of the window will be also clipped >> off. Where as the previous code would clip regions requested by devices, >> rather then the entire window, leaving regions which were either entirely >> above or below a reservation in the middle of the window alone. >> >> E.g. on the Steam Deck this leads to this log message: >> >> acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] >> >> which then gets followed by these log messages: >> >> pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window >> pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window >> >> and many more of these. Ultimately this leads to the Steam Deck >> no longer booting properly, so revert the change. >> >> Note this is not a clean revert, this revert keeps the later change >> to make the clipping dependent on a new pci_use_e820 bool, moving >> the checking of this bool to arch_remove_reservations(). > > 4c5e242d3e93 was definitely a mistake (my fault). My intent was to > mainly to improve logging of the clipping, but I didn't implement it > well. > > That said, I'd like to understand the connection between the messages > you mention and the failure. There are four bridges whose MMIO > windows were in the [mem 0x80000000-0x9fffffff] area that we clipped > out. The log shows that we moved all those windows and the devices in > them to the [mem 0xa0100000-0xf7ffffff] area that remained after > clipping. > > So I think this *should* have worked even though we moved things > around unnecessarily. What am I missing? I don't know? My guess is that maybe the ACPI table do MMIO accesses somewhere to hardcoded addresses and moving things breaks the ACPI tables. > > The E820 map reports [mem 0xa0000000-0xa00fffff] in the middle of the > _CRS, and we currently trim that out. We think this is a firmware > defect, so it's likely to break in 2023 if we stop clipping by > default. I'm concerned that there may be other things in _CRS that we > need to avoid, but firmware isn't telling us about them. > > Or there's some dependency in the devices that we moved on their > original addresses, e.g., firmware on the device latched the address > and didn't notice the reassignment. Right this is the most likely cause I believe. Regards, Hans > > [1] https://bugzilla.kernel.org/attachment.cgi?id=301154 > >> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109 >> Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions") >> Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> >> Signed-off-by: Hans de Goede <hdegoede@redhat.com> >> --- >> arch/x86/include/asm/e820/api.h | 5 ----- >> arch/x86/include/asm/pci_x86.h | 8 ++++++++ >> arch/x86/kernel/resource.c | 14 +++++++++----- >> arch/x86/pci/acpi.c | 8 +------- >> 4 files changed, 18 insertions(+), 17 deletions(-) >> >> diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h >> index 5a39ed59b6db..e8f58ddd06d9 100644 >> --- a/arch/x86/include/asm/e820/api.h >> +++ b/arch/x86/include/asm/e820/api.h >> @@ -4,9 +4,6 @@ >> >> #include <asm/e820/types.h> >> >> -struct device; >> -struct resource; >> - >> extern struct e820_table *e820_table; >> extern struct e820_table *e820_table_kexec; >> extern struct e820_table *e820_table_firmware; >> @@ -46,8 +43,6 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn); >> >> extern int e820__get_entry_type(u64 start, u64 end); >> >> -extern void remove_e820_regions(struct device *dev, struct resource *avail); >> - >> /* >> * Returns true iff the specified range [start,end) is completely contained inside >> * the ISA region. >> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h >> index f52a886d35cf..70533fdcbf02 100644 >> --- a/arch/x86/include/asm/pci_x86.h >> +++ b/arch/x86/include/asm/pci_x86.h >> @@ -69,6 +69,8 @@ void pcibios_scan_specific_bus(int busn); >> >> /* pci-irq.c */ >> >> +struct pci_dev; >> + >> struct irq_info { >> u8 bus, devfn; /* Bus, device and function */ >> struct { >> @@ -246,3 +248,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val) >> # define x86_default_pci_init_irq NULL >> # define x86_default_pci_fixup_irqs NULL >> #endif >> + >> +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI) >> +extern bool pci_use_e820; >> +#else >> +#define pci_use_e820 false >> +#endif >> diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c >> index db2b350a37b7..bba1abd05bfe 100644 >> --- a/arch/x86/kernel/resource.c >> +++ b/arch/x86/kernel/resource.c >> @@ -1,7 +1,8 @@ >> // SPDX-License-Identifier: GPL-2.0 >> -#include <linux/dev_printk.h> >> #include <linux/ioport.h> >> +#include <linux/printk.h> >> #include <asm/e820/api.h> >> +#include <asm/pci_x86.h> >> >> static void resource_clip(struct resource *res, resource_size_t start, >> resource_size_t end) >> @@ -24,14 +25,14 @@ static void resource_clip(struct resource *res, resource_size_t start, >> res->start = end + 1; >> } >> >> -void remove_e820_regions(struct device *dev, struct resource *avail) >> +static void remove_e820_regions(struct resource *avail) >> { >> int i; >> struct e820_entry *entry; >> u64 e820_start, e820_end; >> struct resource orig = *avail; >> >> - if (!(avail->flags & IORESOURCE_MEM)) >> + if (!pci_use_e820) >> return; >> >> for (i = 0; i < e820_table->nr_entries; i++) { >> @@ -41,7 +42,7 @@ void remove_e820_regions(struct device *dev, struct resource *avail) >> >> resource_clip(avail, e820_start, e820_end); >> if (orig.start != avail->start || orig.end != avail->end) { >> - dev_info(dev, "clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n", >> + pr_info("clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n", >> &orig, avail, e820_start, e820_end); >> orig = *avail; >> } >> @@ -55,6 +56,9 @@ void arch_remove_reservations(struct resource *avail) >> * the low 1MB unconditionally, as this area is needed for some ISA >> * cards requiring a memory range, e.g. the i82365 PCMCIA controller. >> */ >> - if (avail->flags & IORESOURCE_MEM) >> + if (avail->flags & IORESOURCE_MEM) { >> resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END); >> + >> + remove_e820_regions(avail); >> + } >> } >> diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c >> index a4f43054bc79..2f82480fd430 100644 >> --- a/arch/x86/pci/acpi.c >> +++ b/arch/x86/pci/acpi.c >> @@ -8,7 +8,6 @@ >> #include <linux/pci-acpi.h> >> #include <asm/numa.h> >> #include <asm/pci_x86.h> >> -#include <asm/e820/api.h> >> >> struct pci_root_info { >> struct acpi_pci_root_info common; >> @@ -20,7 +19,7 @@ struct pci_root_info { >> #endif >> }; >> >> -static bool pci_use_e820 = true; >> +bool pci_use_e820 = true; >> static bool pci_use_crs = true; >> static bool pci_ignore_seg; >> >> @@ -387,11 +386,6 @@ static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci) >> >> status = acpi_pci_probe_root_resources(ci); >> >> - if (pci_use_e820) { >> - resource_list_for_each_entry(entry, &ci->resources) >> - remove_e820_regions(&device->dev, entry->res); >> - } >> - >> if (pci_use_crs) { >> resource_list_for_each_entry_safe(entry, tmp, &ci->resources) >> if (resource_is_pcicfg_ioport(entry->res)) >> -- >> 2.36.0 >> >
On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote: > Clipping the bridge windows directly from pci_acpi_root_prepare_resources() > instead of clipping from arch_remove_reservations(), has a number of > unforseen consequences. > > If there is an e820 reservation in the middle of a bridge window, then > the smallest of the 2 remaining parts of the window will be also clipped > off. Where as the previous code would clip regions requested by devices, > rather then the entire window, leaving regions which were either entirely > above or below a reservation in the middle of the window alone. > > E.g. on the Steam Deck this leads to this log message: > > acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] > > which then gets followed by these log messages: > > pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window > pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window > > and many more of these. Ultimately this leads to the Steam Deck > no longer booting properly, so revert the change. > > Note this is not a clean revert, this revert keeps the later change > to make the clipping dependent on a new pci_use_e820 bool, moving > the checking of this bool to arch_remove_reservations(). It does _not_ fix the Intel MID case. It requires to have my patch applied as well. So the difference as I see is the flags checking. I believe that you still need to have it in case pci_use_e820 == true. But it might be that I missed an importan detail.
On Tue, Jun 14, 2022 at 03:28:36PM +0300, Andy Shevchenko wrote: > On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote: > > Clipping the bridge windows directly from pci_acpi_root_prepare_resources() > > instead of clipping from arch_remove_reservations(), has a number of > > unforseen consequences. > > > > If there is an e820 reservation in the middle of a bridge window, then > > the smallest of the 2 remaining parts of the window will be also clipped > > off. Where as the previous code would clip regions requested by devices, > > rather then the entire window, leaving regions which were either entirely > > above or below a reservation in the middle of the window alone. > > > > E.g. on the Steam Deck this leads to this log message: > > > > acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] > > > > which then gets followed by these log messages: > > > > pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window > > pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window > > > > and many more of these. Ultimately this leads to the Steam Deck > > no longer booting properly, so revert the change. > > > > Note this is not a clean revert, this revert keeps the later change > > to make the clipping dependent on a new pci_use_e820 bool, moving > > the checking of this bool to arch_remove_reservations(). > > It does _not_ fix the Intel MID case. It requires to have my patch applied as well. > So the difference as I see is the flags checking. I believe that you still need to > have it in case pci_use_e820 == true. But it might be that I missed an importan > detail. Yeah, I missed that I have compiled a tested something else, and not what I was focused on. That said, I double checked with your patch and nothing else and v5.19-rc2 based tree works nicely. Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
On Tue, Jun 14, 2022 at 10:15:29AM +0200, Hans de Goede wrote: > On 6/14/22 01:15, Bjorn Helgaas wrote: > > On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote: > >> Clipping the bridge windows directly from pci_acpi_root_prepare_resources() > >> instead of clipping from arch_remove_reservations(), has a number of > >> unforseen consequences. > >> > >> If there is an e820 reservation in the middle of a bridge window, then > >> the smallest of the 2 remaining parts of the window will be also clipped > >> off. Where as the previous code would clip regions requested by devices, > >> rather then the entire window, leaving regions which were either entirely > >> above or below a reservation in the middle of the window alone. > >> > >> E.g. on the Steam Deck this leads to this log message: > >> > >> acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] > >> > >> which then gets followed by these log messages: > >> > >> pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window > >> pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window > >> > >> and many more of these. Ultimately this leads to the Steam Deck > >> no longer booting properly, so revert the change. > >> > >> Note this is not a clean revert, this revert keeps the later change > >> to make the clipping dependent on a new pci_use_e820 bool, moving > >> the checking of this bool to arch_remove_reservations(). > > > > 4c5e242d3e93 was definitely a mistake (my fault). My intent was to > > mainly to improve logging of the clipping, but I didn't implement it > > well. > > > > That said, I'd like to understand the connection between the messages > > you mention and the failure. There are four bridges whose MMIO > > windows were in the [mem 0x80000000-0x9fffffff] area that we clipped > > out. The log shows that we moved all those windows and the devices in > > them to the [mem 0xa0100000-0xf7ffffff] area that remained after > > clipping. > > > > So I think this *should* have worked even though we moved things > > around unnecessarily. What am I missing? > > I don't know? My guess is that maybe the ACPI table do MMIO accesses > somewhere to hardcoded addresses and moving things breaks the ACPI > tables. This would be a firmware defect, IMHO. There is a mechanism (_DSM for preserving PCI Boot Configurations) for the firmware to tell us about things that can't be moved. > > The E820 map reports [mem 0xa0000000-0xa00fffff] in the middle of the > > _CRS, and we currently trim that out. We think this is a firmware > > defect, so it's likely to break in 2023 if we stop clipping by > > default. I'm concerned that there may be other things in _CRS that we > > need to avoid, but firmware isn't telling us about them. > > > > Or there's some dependency in the devices that we moved on their > > original addresses, e.g., firmware on the device latched the address > > and didn't notice the reassignment. > > Right this is the most likely cause I believe. This would be another defect, in the device this time. If we can identify the device, possibly we could quirk around it. Either one will be back to bite us in the future if we support rebalancing resources to make room for hot-added devices. I *think* Windows already supports this kind of rebalancing. Anyway, this is why we need to dig a little deeper to figure out exactly what's going wrong here. Bjorn
Hi, On 6/14/22 17:17, Bjorn Helgaas wrote: > On Tue, Jun 14, 2022 at 10:15:29AM +0200, Hans de Goede wrote: >> On 6/14/22 01:15, Bjorn Helgaas wrote: >>> On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote: >>>> Clipping the bridge windows directly from pci_acpi_root_prepare_resources() >>>> instead of clipping from arch_remove_reservations(), has a number of >>>> unforseen consequences. >>>> >>>> If there is an e820 reservation in the middle of a bridge window, then >>>> the smallest of the 2 remaining parts of the window will be also clipped >>>> off. Where as the previous code would clip regions requested by devices, >>>> rather then the entire window, leaving regions which were either entirely >>>> above or below a reservation in the middle of the window alone. >>>> >>>> E.g. on the Steam Deck this leads to this log message: >>>> >>>> acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] >>>> >>>> which then gets followed by these log messages: >>>> >>>> pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window >>>> pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window >>>> >>>> and many more of these. Ultimately this leads to the Steam Deck >>>> no longer booting properly, so revert the change. >>>> >>>> Note this is not a clean revert, this revert keeps the later change >>>> to make the clipping dependent on a new pci_use_e820 bool, moving >>>> the checking of this bool to arch_remove_reservations(). >>> >>> 4c5e242d3e93 was definitely a mistake (my fault). My intent was to >>> mainly to improve logging of the clipping, but I didn't implement it >>> well. >>> >>> That said, I'd like to understand the connection between the messages >>> you mention and the failure. There are four bridges whose MMIO >>> windows were in the [mem 0x80000000-0x9fffffff] area that we clipped >>> out. The log shows that we moved all those windows and the devices in >>> them to the [mem 0xa0100000-0xf7ffffff] area that remained after >>> clipping. >>> >>> So I think this *should* have worked even though we moved things >>> around unnecessarily. What am I missing? >> >> I don't know? My guess is that maybe the ACPI table do MMIO accesses >> somewhere to hardcoded addresses and moving things breaks the ACPI >> tables. > > This would be a firmware defect, IMHO. There is a mechanism (_DSM for > preserving PCI Boot Configurations) for the firmware to tell us about > things that can't be moved. > >>> The E820 map reports [mem 0xa0000000-0xa00fffff] in the middle of the >>> _CRS, and we currently trim that out. We think this is a firmware >>> defect, so it's likely to break in 2023 if we stop clipping by >>> default. I'm concerned that there may be other things in _CRS that we >>> need to avoid, but firmware isn't telling us about them. >>> >>> Or there's some dependency in the devices that we moved on their >>> original addresses, e.g., firmware on the device latched the address >>> and didn't notice the reassignment. >> >> Right this is the most likely cause I believe. > > This would be another defect, in the device this time. If we can > identify the device, possibly we could quirk around it. > > Either one will be back to bite us in the future if we support > rebalancing resources to make room for hot-added devices. I *think* > Windows already supports this kind of rebalancing. Anyway, this is > why we need to dig a little deeper to figure out exactly what's going > wrong here. Have you looked at the log of the failed boot in the Steam Deck kernel bugzilla? Everything there seems to work just fine and then the system just hangs. I think that maybe it cannot find its root disk, so maybe an NVME issue ? I'm afraid that if we cannot figure out the exact root cause from the failed boot log there is not much more we can do. Unless you have an idea how to debug this further and want to ask the reporter for more info? Regards, Hans
On 14/06/2022 12:47, Hans de Goede wrote: > [...] > > Have you looked at the log of the failed boot in the Steam Deck kernel > bugzilla? Everything there seems to work just fine and then the system > just hangs. I think that maybe it cannot find its root disk, so maybe > an NVME issue ? > *Exactly* that - NVMe device is the root disk, it cannot boot since the device doesn't work, hence no rootfs =) Patch looks great from my side, thanks ! cheers, Guilherme
[+cc NVMe folks] On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote: > On 14/06/2022 12:47, Hans de Goede wrote: > > [...] > > > > Have you looked at the log of the failed boot in the Steam Deck kernel > > bugzilla? Everything there seems to work just fine and then the system > > just hangs. I think that maybe it cannot find its root disk, so maybe > > an NVME issue ? > > *Exactly* that - NVMe device is the root disk, it cannot boot since the > device doesn't work, hence no rootfs =) Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com Steam Deck broke because we erroneously trimmed out the PCI host bridge window where BIOS had placed most devices, successfully reassigned all the PCI bridge windows and BARs, but some devices, apparently including NVMe, didn't work at the new addresses. Do you NVMe folks know of gotchas in this area? I want to know because we'd like to be able to move devices around someday to make room for hot-added devices. This reassignment happened before drivers claimed the devices, so from a PCI point of view, I don't know why the NVMe device wouldn't work at the new address. Bjorn
On Tue, Jun 14, 2022 at 06:01:28PM -0500, Bjorn Helgaas wrote: > [+cc NVMe folks] > > On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote: > > On 14/06/2022 12:47, Hans de Goede wrote: > > > [...] > > > > > > Have you looked at the log of the failed boot in the Steam Deck kernel > > > bugzilla? Everything there seems to work just fine and then the system > > > just hangs. I think that maybe it cannot find its root disk, so maybe > > > an NVME issue ? > > > > *Exactly* that - NVMe device is the root disk, it cannot boot since the > > device doesn't work, hence no rootfs =) > > Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com > > Steam Deck broke because we erroneously trimmed out the PCI host > bridge window where BIOS had placed most devices, successfully > reassigned all the PCI bridge windows and BARs, but some devices, > apparently including NVMe, didn't work at the new addresses. > > Do you NVMe folks know of gotchas in this area? I want to know > because we'd like to be able to move devices around someday to make > room for hot-added devices. > > This reassignment happened before drivers claimed the devices, so from > a PCI point of view, I don't know why the NVMe device wouldn't work at > the new address. The probe status quickly returns ENODEV. Based on the output (we don't log much, so this is just an educated guesss), I think that means the driver read all F's from the status register, which indicates we can't read it when using the reassigned memory window. Why changing memory windows may not work tends to be platform or device specific. Considering the renumbered windows didn't cause a problem for other devices, it sounds like this nvme device may be broken.
On Tue, Jun 14, 2022 at 04:47:35PM -0700, Keith Busch wrote: > On Tue, Jun 14, 2022 at 06:01:28PM -0500, Bjorn Helgaas wrote: > > [+cc NVMe folks] > > > > On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote: > > > On 14/06/2022 12:47, Hans de Goede wrote: > > > > [...] > > > > > > > > Have you looked at the log of the failed boot in the Steam Deck kernel > > > > bugzilla? Everything there seems to work just fine and then the system > > > > just hangs. I think that maybe it cannot find its root disk, so maybe > > > > an NVME issue ? > > > > > > *Exactly* that - NVMe device is the root disk, it cannot boot since the > > > device doesn't work, hence no rootfs =) > > > > Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com > > > > Steam Deck broke because we erroneously trimmed out the PCI host > > bridge window where BIOS had placed most devices, successfully > > reassigned all the PCI bridge windows and BARs, but some devices, > > apparently including NVMe, didn't work at the new addresses. > > > > Do you NVMe folks know of gotchas in this area? I want to know > > because we'd like to be able to move devices around someday to > > make room for hot-added devices. > > > > This reassignment happened before drivers claimed the devices, so > > from a PCI point of view, I don't know why the NVMe device > > wouldn't work at the new address. > > The probe status quickly returns ENODEV. Based on the output (we > don't log much, so this is just an educated guesss), I think that > means the driver read all F's from the status register, which > indicates we can't read it when using the reassigned memory window. > > Why changing memory windows may not work tends to be platform or > device specific. Considering the renumbered windows didn't cause a > problem for other devices, it sounds like this nvme device may be > broken. It sounds like you've seen this sort of problem before, so we shouldn't assume that it's safe to reassign BARs. I think Windows supports rebalancing, but it does look like drivers have the ability to veto it: https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/stopping-a-device-to-rebalance-resources https://docs.microsoft.com/en-us/windows-hardware/drivers/wdf/the-pnp-manager-redistributes-system-resources So I suppose if/when we support rebalancing, it'll have to be an opt-in thing for each driver.
[+cc Andy, Benjamin, Keith] On Sun, Jun 12, 2022 at 04:43:25PM +0200, Hans de Goede wrote: > Clipping the bridge windows directly from pci_acpi_root_prepare_resources() > instead of clipping from arch_remove_reservations(), has a number of > unforseen consequences. > > If there is an e820 reservation in the middle of a bridge window, then > the smallest of the 2 remaining parts of the window will be also clipped > off. Where as the previous code would clip regions requested by devices, > rather then the entire window, leaving regions which were either entirely > above or below a reservation in the middle of the window alone. > > E.g. on the Steam Deck this leads to this log message: > > acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] > > which then gets followed by these log messages: > > pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window > pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window > > and many more of these. Ultimately this leads to the Steam Deck > no longer booting properly, so revert the change. > > Note this is not a clean revert, this revert keeps the later change > to make the clipping dependent on a new pci_use_e820 bool, moving > the checking of this bool to arch_remove_reservations(). > > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109 > Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions") > Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> > Signed-off-by: Hans de Goede <hdegoede@redhat.com> Applied to for-linus for v5.19. Thanks for cleaning up my mess, Hans. > --- > arch/x86/include/asm/e820/api.h | 5 ----- > arch/x86/include/asm/pci_x86.h | 8 ++++++++ > arch/x86/kernel/resource.c | 14 +++++++++----- > arch/x86/pci/acpi.c | 8 +------- > 4 files changed, 18 insertions(+), 17 deletions(-) > > diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h > index 5a39ed59b6db..e8f58ddd06d9 100644 > --- a/arch/x86/include/asm/e820/api.h > +++ b/arch/x86/include/asm/e820/api.h > @@ -4,9 +4,6 @@ > > #include <asm/e820/types.h> > > -struct device; > -struct resource; > - > extern struct e820_table *e820_table; > extern struct e820_table *e820_table_kexec; > extern struct e820_table *e820_table_firmware; > @@ -46,8 +43,6 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn); > > extern int e820__get_entry_type(u64 start, u64 end); > > -extern void remove_e820_regions(struct device *dev, struct resource *avail); > - > /* > * Returns true iff the specified range [start,end) is completely contained inside > * the ISA region. > diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h > index f52a886d35cf..70533fdcbf02 100644 > --- a/arch/x86/include/asm/pci_x86.h > +++ b/arch/x86/include/asm/pci_x86.h > @@ -69,6 +69,8 @@ void pcibios_scan_specific_bus(int busn); > > /* pci-irq.c */ > > +struct pci_dev; > + > struct irq_info { > u8 bus, devfn; /* Bus, device and function */ > struct { > @@ -246,3 +248,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val) > # define x86_default_pci_init_irq NULL > # define x86_default_pci_fixup_irqs NULL > #endif > + > +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI) > +extern bool pci_use_e820; > +#else > +#define pci_use_e820 false > +#endif > diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c > index db2b350a37b7..bba1abd05bfe 100644 > --- a/arch/x86/kernel/resource.c > +++ b/arch/x86/kernel/resource.c > @@ -1,7 +1,8 @@ > // SPDX-License-Identifier: GPL-2.0 > -#include <linux/dev_printk.h> > #include <linux/ioport.h> > +#include <linux/printk.h> > #include <asm/e820/api.h> > +#include <asm/pci_x86.h> > > static void resource_clip(struct resource *res, resource_size_t start, > resource_size_t end) > @@ -24,14 +25,14 @@ static void resource_clip(struct resource *res, resource_size_t start, > res->start = end + 1; > } > > -void remove_e820_regions(struct device *dev, struct resource *avail) > +static void remove_e820_regions(struct resource *avail) > { > int i; > struct e820_entry *entry; > u64 e820_start, e820_end; > struct resource orig = *avail; > > - if (!(avail->flags & IORESOURCE_MEM)) > + if (!pci_use_e820) > return; > > for (i = 0; i < e820_table->nr_entries; i++) { > @@ -41,7 +42,7 @@ void remove_e820_regions(struct device *dev, struct resource *avail) > > resource_clip(avail, e820_start, e820_end); > if (orig.start != avail->start || orig.end != avail->end) { > - dev_info(dev, "clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n", > + pr_info("clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n", > &orig, avail, e820_start, e820_end); > orig = *avail; > } > @@ -55,6 +56,9 @@ void arch_remove_reservations(struct resource *avail) > * the low 1MB unconditionally, as this area is needed for some ISA > * cards requiring a memory range, e.g. the i82365 PCMCIA controller. > */ > - if (avail->flags & IORESOURCE_MEM) > + if (avail->flags & IORESOURCE_MEM) { > resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END); > + > + remove_e820_regions(avail); > + } > } > diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c > index a4f43054bc79..2f82480fd430 100644 > --- a/arch/x86/pci/acpi.c > +++ b/arch/x86/pci/acpi.c > @@ -8,7 +8,6 @@ > #include <linux/pci-acpi.h> > #include <asm/numa.h> > #include <asm/pci_x86.h> > -#include <asm/e820/api.h> > > struct pci_root_info { > struct acpi_pci_root_info common; > @@ -20,7 +19,7 @@ struct pci_root_info { > #endif > }; > > -static bool pci_use_e820 = true; > +bool pci_use_e820 = true; > static bool pci_use_crs = true; > static bool pci_ignore_seg; > > @@ -387,11 +386,6 @@ static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci) > > status = acpi_pci_probe_root_resources(ci); > > - if (pci_use_e820) { > - resource_list_for_each_entry(entry, &ci->resources) > - remove_e820_regions(&device->dev, entry->res); > - } > - > if (pci_use_crs) { > resource_list_for_each_entry_safe(entry, tmp, &ci->resources) > if (resource_is_pcicfg_ioport(entry->res)) > -- > 2.36.0 >
On Wed, Jun 15, 2022 at 10:11:00AM -0500, Bjorn Helgaas wrote: > On Tue, Jun 14, 2022 at 04:47:35PM -0700, Keith Busch wrote: > > On Tue, Jun 14, 2022 at 06:01:28PM -0500, Bjorn Helgaas wrote: > > > [+cc NVMe folks] > > > > > > On Tue, Jun 14, 2022 at 07:49:27PM -0300, Guilherme G. Piccoli wrote: > > > > On 14/06/2022 12:47, Hans de Goede wrote: > > > > > [...] > > > > > > > > > > Have you looked at the log of the failed boot in the Steam Deck kernel > > > > > bugzilla? Everything there seems to work just fine and then the system > > > > > just hangs. I think that maybe it cannot find its root disk, so maybe > > > > > an NVME issue ? > > > > > > > > *Exactly* that - NVMe device is the root disk, it cannot boot since the > > > > device doesn't work, hence no rootfs =) > > > > > > Beginning of thread: https://lore.kernel.org/r/20220612144325.85366-1-hdegoede@redhat.com > > > > > > Steam Deck broke because we erroneously trimmed out the PCI host > > > bridge window where BIOS had placed most devices, successfully > > > reassigned all the PCI bridge windows and BARs, but some devices, > > > apparently including NVMe, didn't work at the new addresses. > > > > > > Do you NVMe folks know of gotchas in this area? I want to know > > > because we'd like to be able to move devices around someday to > > > make room for hot-added devices. > > > > > > This reassignment happened before drivers claimed the devices, so > > > from a PCI point of view, I don't know why the NVMe device > > > wouldn't work at the new address. > > > > The probe status quickly returns ENODEV. Based on the output (we > > don't log much, so this is just an educated guesss), I think that > > means the driver read all F's from the status register, which > > indicates we can't read it when using the reassigned memory window. > > > > Why changing memory windows may not work tends to be platform or > > device specific. Considering the renumbered windows didn't cause a > > problem for other devices, it sounds like this nvme device may be > > broken. > > It sounds like you've seen this sort of problem before, so we > shouldn't assume that it's safe to reassign BARs. I haven't seen this type of problem in years, but as I recall, it was always low-end consumer crap that couldn't deal with changing BARs; you're stuck with whatever was set after it was initially powered on. The PCI topology will reflect the expected renumbering, but whatever is happening on the other side of the PCI function seems to be unaware of the change.
diff --git a/arch/x86/include/asm/e820/api.h b/arch/x86/include/asm/e820/api.h index 5a39ed59b6db..e8f58ddd06d9 100644 --- a/arch/x86/include/asm/e820/api.h +++ b/arch/x86/include/asm/e820/api.h @@ -4,9 +4,6 @@ #include <asm/e820/types.h> -struct device; -struct resource; - extern struct e820_table *e820_table; extern struct e820_table *e820_table_kexec; extern struct e820_table *e820_table_firmware; @@ -46,8 +43,6 @@ extern void e820__register_nosave_regions(unsigned long limit_pfn); extern int e820__get_entry_type(u64 start, u64 end); -extern void remove_e820_regions(struct device *dev, struct resource *avail); - /* * Returns true iff the specified range [start,end) is completely contained inside * the ISA region. diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index f52a886d35cf..70533fdcbf02 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -69,6 +69,8 @@ void pcibios_scan_specific_bus(int busn); /* pci-irq.c */ +struct pci_dev; + struct irq_info { u8 bus, devfn; /* Bus, device and function */ struct { @@ -246,3 +248,9 @@ static inline void mmio_config_writel(void __iomem *pos, u32 val) # define x86_default_pci_init_irq NULL # define x86_default_pci_fixup_irqs NULL #endif + +#if defined(CONFIG_PCI) && defined(CONFIG_ACPI) +extern bool pci_use_e820; +#else +#define pci_use_e820 false +#endif diff --git a/arch/x86/kernel/resource.c b/arch/x86/kernel/resource.c index db2b350a37b7..bba1abd05bfe 100644 --- a/arch/x86/kernel/resource.c +++ b/arch/x86/kernel/resource.c @@ -1,7 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 -#include <linux/dev_printk.h> #include <linux/ioport.h> +#include <linux/printk.h> #include <asm/e820/api.h> +#include <asm/pci_x86.h> static void resource_clip(struct resource *res, resource_size_t start, resource_size_t end) @@ -24,14 +25,14 @@ static void resource_clip(struct resource *res, resource_size_t start, res->start = end + 1; } -void remove_e820_regions(struct device *dev, struct resource *avail) +static void remove_e820_regions(struct resource *avail) { int i; struct e820_entry *entry; u64 e820_start, e820_end; struct resource orig = *avail; - if (!(avail->flags & IORESOURCE_MEM)) + if (!pci_use_e820) return; for (i = 0; i < e820_table->nr_entries; i++) { @@ -41,7 +42,7 @@ void remove_e820_regions(struct device *dev, struct resource *avail) resource_clip(avail, e820_start, e820_end); if (orig.start != avail->start || orig.end != avail->end) { - dev_info(dev, "clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n", + pr_info("clipped %pR to %pR for e820 entry [mem %#010Lx-%#010Lx]\n", &orig, avail, e820_start, e820_end); orig = *avail; } @@ -55,6 +56,9 @@ void arch_remove_reservations(struct resource *avail) * the low 1MB unconditionally, as this area is needed for some ISA * cards requiring a memory range, e.g. the i82365 PCMCIA controller. */ - if (avail->flags & IORESOURCE_MEM) + if (avail->flags & IORESOURCE_MEM) { resource_clip(avail, BIOS_ROM_BASE, BIOS_ROM_END); + + remove_e820_regions(avail); + } } diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index a4f43054bc79..2f82480fd430 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -8,7 +8,6 @@ #include <linux/pci-acpi.h> #include <asm/numa.h> #include <asm/pci_x86.h> -#include <asm/e820/api.h> struct pci_root_info { struct acpi_pci_root_info common; @@ -20,7 +19,7 @@ struct pci_root_info { #endif }; -static bool pci_use_e820 = true; +bool pci_use_e820 = true; static bool pci_use_crs = true; static bool pci_ignore_seg; @@ -387,11 +386,6 @@ static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci) status = acpi_pci_probe_root_resources(ci); - if (pci_use_e820) { - resource_list_for_each_entry(entry, &ci->resources) - remove_e820_regions(&device->dev, entry->res); - } - if (pci_use_crs) { resource_list_for_each_entry_safe(entry, tmp, &ci->resources) if (resource_is_pcicfg_ioport(entry->res))
Clipping the bridge windows directly from pci_acpi_root_prepare_resources() instead of clipping from arch_remove_reservations(), has a number of unforseen consequences. If there is an e820 reservation in the middle of a bridge window, then the smallest of the 2 remaining parts of the window will be also clipped off. Where as the previous code would clip regions requested by devices, rather then the entire window, leaving regions which were either entirely above or below a reservation in the middle of the window alone. E.g. on the Steam Deck this leads to this log message: acpi PNP0A08:00: clipped [mem 0x80000000-0xf7ffffff window] to [mem 0xa0100000-0xf7ffffff window] which then gets followed by these log messages: pci 0000:00:01.2: can't claim BAR 14 [mem 0x80600000-0x806fffff]: no compatible bridge window pci 0000:00:01.3: can't claim BAR 14 [mem 0x80500000-0x805fffff]: no compatible bridge window and many more of these. Ultimately this leads to the Steam Deck no longer booting properly, so revert the change. Note this is not a clean revert, this revert keeps the later change to make the clipping dependent on a new pci_use_e820 bool, moving the checking of this bool to arch_remove_reservations(). BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=216109 Fixes: 4c5e242d3e93 ("x86/PCI: Clip only host bridge windows for E820 regions") Reported-and-tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> --- arch/x86/include/asm/e820/api.h | 5 ----- arch/x86/include/asm/pci_x86.h | 8 ++++++++ arch/x86/kernel/resource.c | 14 +++++++++----- arch/x86/pci/acpi.c | 8 +------- 4 files changed, 18 insertions(+), 17 deletions(-)