Message ID | 20221004153645.3686-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for-4.17] x86/pci: allow BARs to be positioned on e820 reserved regions | expand |
On 04.10.2022 17:36, Roger Pau Monne wrote: > The EFI memory map contains two memory types (EfiMemoryMappedIO and > EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by > EFI firmware. > > The current parsing of the EFI memory map is translating > EfiMemoryMappedIO to E820_RESERVED on x86. This causes issues on some > boxes as the firmware is relying on using those regions to position > the BARs of devices being used (possibly during runtime) for the > firmware. > > Xen will disable memory decoding on any device that has BARs > positioned over any regions on the e820 memory map, hence the firmware > will malfunction after Xen turning off memory decoding for the > device(s) that have BARs mapped in EfiMemoryMappedIO regions. > > The system under which this was observed has: > > EFI memory map: > [...] > 00000fd000000-00000fe7fffff type=11 attr=800000000000100d > [...] > 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map > > The device behind this BAR is: > > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) > Subsystem: Super Micro Computer Inc Device 091c > Flags: fast devsel > Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well > > For the record, the symptom observed in that machine was a hard freeze > when attempting to set an EFI variable (XEN_EFI_set_variable). > > Fix by allowing BARs of PCI devices to be positioned over reserved > memory regions, but print a warning message about such overlap. Somewhat hesitantly I might ack this change, but I'd like to give others (Andrew in particular) some time to voice their views. As said during the earlier discussion - I think we're relaxing things too much by going this route. > --- a/xen/arch/x86/pci.c > +++ b/xen/arch/x86/pci.c > @@ -98,3 +98,30 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, > > return rc; > } > + > +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end) > +{ > + unsigned long mfn; > + > + /* > + * Check if BAR is not overlapping with any memory region defined > + * in the memory map. > + */ > + if ( is_memory_hole(start, end) ) > + return true; > + > + /* > + * Also allow BARs placed on reserved regions in order to deal with EFI > + * firmware using EfiMemoryMappedIO regions to place the BARs of devices > + * that can be used during runtime. But print a warning when doing so. > + */ > + for ( mfn = mfn_x(start); mfn <= mfn_x(end); mfn++ ) > + if ( !page_is_ram_type(mfn, RAM_TYPE_RESERVED) ) > + return false; We don't need to be arch-independent here and hence instead of this (potentially long) loop can't we use a single call to e820_all_mapped()? Jan
On Tue, Oct 04, 2022 at 06:11:50PM +0200, Jan Beulich wrote: > On 04.10.2022 17:36, Roger Pau Monne wrote: > > The EFI memory map contains two memory types (EfiMemoryMappedIO and > > EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by > > EFI firmware. > > > > The current parsing of the EFI memory map is translating > > EfiMemoryMappedIO to E820_RESERVED on x86. This causes issues on some > > boxes as the firmware is relying on using those regions to position > > the BARs of devices being used (possibly during runtime) for the > > firmware. > > > > Xen will disable memory decoding on any device that has BARs > > positioned over any regions on the e820 memory map, hence the firmware > > will malfunction after Xen turning off memory decoding for the > > device(s) that have BARs mapped in EfiMemoryMappedIO regions. > > > > The system under which this was observed has: > > > > EFI memory map: > > [...] > > 00000fd000000-00000fe7fffff type=11 attr=800000000000100d > > [...] > > 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map > > > > The device behind this BAR is: > > > > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) > > Subsystem: Super Micro Computer Inc Device 091c > > Flags: fast devsel > > Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well > > > > For the record, the symptom observed in that machine was a hard freeze > > when attempting to set an EFI variable (XEN_EFI_set_variable). > > > > Fix by allowing BARs of PCI devices to be positioned over reserved > > memory regions, but print a warning message about such overlap. > > Somewhat hesitantly I might ack this change, but I'd like to give > others (Andrew in particular) some time to voice their views. As > said during the earlier discussion - I think we're relaxing things > too much by going this route. Another option would be to explicitly check in efi_memmap for EfiMemoryMappedIO regions and BAR overlap and only allow those. That would be more cumbersome code wise AFAICT. > > --- a/xen/arch/x86/pci.c > > +++ b/xen/arch/x86/pci.c > > @@ -98,3 +98,30 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, > > > > return rc; > > } > > + > > +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end) > > +{ > > + unsigned long mfn; > > + > > + /* > > + * Check if BAR is not overlapping with any memory region defined > > + * in the memory map. > > + */ > > + if ( is_memory_hole(start, end) ) > > + return true; > > + > > + /* > > + * Also allow BARs placed on reserved regions in order to deal with EFI > > + * firmware using EfiMemoryMappedIO regions to place the BARs of devices > > + * that can be used during runtime. But print a warning when doing so. > > + */ > > + for ( mfn = mfn_x(start); mfn <= mfn_x(end); mfn++ ) > > + if ( !page_is_ram_type(mfn, RAM_TYPE_RESERVED) ) > > + return false; > > We don't need to be arch-independent here and hence instead of this > (potentially long) loop can't we use a single call to e820_all_mapped()? Sure, was searching for a range checking functions but wrongly looked into mm.c instead of e820. I would have to make the function non-init, but I think that's fine. Thanks, Roger.
On 05.10.2022 10:37, Roger Pau Monné wrote: > On Tue, Oct 04, 2022 at 06:11:50PM +0200, Jan Beulich wrote: >> On 04.10.2022 17:36, Roger Pau Monne wrote: >>> The EFI memory map contains two memory types (EfiMemoryMappedIO and >>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by >>> EFI firmware. >>> >>> The current parsing of the EFI memory map is translating >>> EfiMemoryMappedIO to E820_RESERVED on x86. This causes issues on some >>> boxes as the firmware is relying on using those regions to position >>> the BARs of devices being used (possibly during runtime) for the >>> firmware. >>> >>> Xen will disable memory decoding on any device that has BARs >>> positioned over any regions on the e820 memory map, hence the firmware >>> will malfunction after Xen turning off memory decoding for the >>> device(s) that have BARs mapped in EfiMemoryMappedIO regions. >>> >>> The system under which this was observed has: >>> >>> EFI memory map: >>> [...] >>> 00000fd000000-00000fe7fffff type=11 attr=800000000000100d >>> [...] >>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map >>> >>> The device behind this BAR is: >>> >>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) >>> Subsystem: Super Micro Computer Inc Device 091c >>> Flags: fast devsel >>> Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well >>> >>> For the record, the symptom observed in that machine was a hard freeze >>> when attempting to set an EFI variable (XEN_EFI_set_variable). >>> >>> Fix by allowing BARs of PCI devices to be positioned over reserved >>> memory regions, but print a warning message about such overlap. >> >> Somewhat hesitantly I might ack this change, but I'd like to give >> others (Andrew in particular) some time to voice their views. As >> said during the earlier discussion - I think we're relaxing things >> too much by going this route. > > Another option would be to explicitly check in efi_memmap for > EfiMemoryMappedIO regions and BAR overlap and only allow those. That > would be more cumbersome code wise AFAICT. Indeed there's a question of balancing well here, between two outcomes neither of which is really desirable. Jan
On Wed, Oct 05, 2022 at 10:53:38AM +0200, Jan Beulich wrote: > On 05.10.2022 10:37, Roger Pau Monné wrote: > > On Tue, Oct 04, 2022 at 06:11:50PM +0200, Jan Beulich wrote: > >> On 04.10.2022 17:36, Roger Pau Monne wrote: > >>> The EFI memory map contains two memory types (EfiMemoryMappedIO and > >>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by > >>> EFI firmware. > >>> > >>> The current parsing of the EFI memory map is translating > >>> EfiMemoryMappedIO to E820_RESERVED on x86. This causes issues on some > >>> boxes as the firmware is relying on using those regions to position > >>> the BARs of devices being used (possibly during runtime) for the > >>> firmware. > >>> > >>> Xen will disable memory decoding on any device that has BARs > >>> positioned over any regions on the e820 memory map, hence the firmware > >>> will malfunction after Xen turning off memory decoding for the > >>> device(s) that have BARs mapped in EfiMemoryMappedIO regions. > >>> > >>> The system under which this was observed has: > >>> > >>> EFI memory map: > >>> [...] > >>> 00000fd000000-00000fe7fffff type=11 attr=800000000000100d > >>> [...] > >>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map > >>> > >>> The device behind this BAR is: > >>> > >>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) > >>> Subsystem: Super Micro Computer Inc Device 091c > >>> Flags: fast devsel > >>> Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well > >>> > >>> For the record, the symptom observed in that machine was a hard freeze > >>> when attempting to set an EFI variable (XEN_EFI_set_variable). > >>> > >>> Fix by allowing BARs of PCI devices to be positioned over reserved > >>> memory regions, but print a warning message about such overlap. > >> > >> Somewhat hesitantly I might ack this change, but I'd like to give > >> others (Andrew in particular) some time to voice their views. As > >> said during the earlier discussion - I think we're relaxing things > >> too much by going this route. > > > > Another option would be to explicitly check in efi_memmap for > > EfiMemoryMappedIO regions and BAR overlap and only allow those. That > > would be more cumbersome code wise AFAICT. > > Indeed there's a question of balancing well here, between two outcomes > neither of which is really desirable. Hm, I have the following diff attached below which is more limited, and not so bad I think. Initially I wanted to introduce an efi_all_mapped() helper, but that would require exposing EFI_MEMORY_TYPE which is quite intrusive. Let me know if you think the proposal below is better and I will formally send a patch with it. Thanks, Roger. --- diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h index f4a58c8acf..c8e1a9ecdb 100644 --- a/xen/arch/x86/include/asm/pci.h +++ b/xen/arch/x86/include/asm/pci.h @@ -57,14 +57,4 @@ static always_inline bool is_pci_passthrough_enabled(void) void arch_pci_init_pdev(struct pci_dev *pdev); -static inline bool pci_check_bar(const struct pci_dev *pdev, - mfn_t start, mfn_t end) -{ - /* - * Check if BAR is not overlapping with any memory region defined - * in the memory map. - */ - return is_memory_hole(start, end); -} - #endif /* __X86_PCI_H__ */ diff --git a/xen/arch/x86/pci.c b/xen/arch/x86/pci.c index 97b792e578..c3737e226d 100644 --- a/xen/arch/x86/pci.c +++ b/xen/arch/x86/pci.c @@ -4,6 +4,7 @@ * Architecture-dependent PCI access functions. */ +#include <xen/efi.h> #include <xen/spinlock.h> #include <xen/pci.h> #include <asm/io.h> @@ -98,3 +99,28 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, return rc; } + +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end) +{ + /* + * Check if BAR is not overlapping with any memory region defined + * in the memory map. + */ + if ( is_memory_hole(start, end) ) + return true; + + /* + * Also allow BARs placed on EfiMemoryMappedIO regions in order to deal + * with EFI firmware using those regions to place the BARs of devices that + * can be used during runtime. But print a warning when doing so. + */ + if ( !efi_all_runtime_mmio(mfn_to_maddr(start), + mfn_to_maddr(mfn_add(end, 1))) ) + return false; + + printk(XENLOG_WARNING + "%pp: BAR [%#" PRI_mfn ", %#" PRI_mfn "] overlaps reserved region\n", + &pdev->sbdf, mfn_x(start), mfn_x(end)); + + return true; +} diff --git a/xen/common/efi/runtime.c b/xen/common/efi/runtime.c index 13b0975866..b69c710ce3 100644 --- a/xen/common/efi/runtime.c +++ b/xen/common/efi/runtime.c @@ -78,6 +78,30 @@ bool efi_enabled(unsigned int feature) return test_bit(feature, &efi_flags); } +/* + * This function checks if the entire range [start,end) is contained inside of + * a single EfiMemoryMappedIO descriptor with the runtime attribute set. + */ +bool efi_all_runtime_mmio(uint64_t start, uint64_t end) +{ + unsigned int i; + + for ( i = 0; i < efi_memmap_size; i += efi_mdesc_size ) + { + EFI_MEMORY_DESCRIPTOR *desc = efi_memmap + i; + uint64_t len = desc->NumberOfPages << EFI_PAGE_SHIFT; + + if ( desc->Type != EfiMemoryMappedIO || + !(desc->Attribute & EFI_MEMORY_RUNTIME) ) + continue; + + if ( start >= desc->PhysicalStart && end <= desc->PhysicalStart + len ) + return true; + } + + return false; +} + #ifndef CONFIG_ARM /* TODO - disabled until implemented on ARM */ struct efi_rs_state efi_rs_enter(void) diff --git a/xen/include/xen/efi.h b/xen/include/xen/efi.h index 94a7e547f9..f29ea1a320 100644 --- a/xen/include/xen/efi.h +++ b/xen/include/xen/efi.h @@ -45,6 +45,8 @@ int efi_runtime_call(struct xenpf_efi_runtime_call *); int efi_compat_get_info(uint32_t idx, union compat_pf_efi_info *); int efi_compat_runtime_call(struct compat_pf_efi_runtime_call *); +bool efi_all_runtime_mmio(uint64_t start, uint64_t end); + #endif /* !__ASSEMBLY__ */ #endif /* __XEN_EFI_H__ */ diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 5975ca2f30..64995fc68d 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -211,6 +211,7 @@ unsigned int pci_size_mem_bar(pci_sbdf_t sbdf, unsigned int pos, void pci_intx(const struct pci_dev *, bool enable); bool_t pcie_aer_get_firmware_first(const struct pci_dev *); +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end); struct pirq; int msixtbl_pt_register(struct domain *, struct pirq *, uint64_t gtable);
On 05.10.2022 16:12, Roger Pau Monné wrote: > Hm, I have the following diff attached below which is more limited, > and not so bad I think. Initially I wanted to introduce an > efi_all_mapped() helper, but that would require exposing EFI_MEMORY_TYPE > which is quite intrusive. > > Let me know if you think the proposal below is better and I will > formally send a patch with it. Hmm, personally I like this slightly better for, as you say, its more limited effect. Objectively, however, I'm still unconvinced of making this an EFI special case. How would non-EFI firmware go about communicating that it is going to access a device at runtime (which, as said before, I consider a no-go in the first place)? Likely by putting its BAR range(s) in an E820_RESERVED region. Plus the MMIO range covered on the system in question is pretty large. That way we're still allowing pretty wide an abuse by the firmware. Furthermore the MCFG range would also be covered by an EfiMemoryMappedIO descriptor (in fact that's the only use of the type I had been aware of so far). IOW the change specifically permits an overlap of a BAR with an MCFG range. Who's the manufacturer of the system? Or put in different words - how likely is it that we could first gain understanding on their intentions with this region? You did say the system hangs hard without some kind of workaround, but that doesn't clarify to me in how far a use of the device by the firmware was actually involved there. Have you considered other routes towards dealing with the issue? One approach coming to mind would build on top of what you've been doing so far (either variant): Besides avoiding the turning off of memory decode, also invoke pci_ro_device(), thus protecting it from having its BARs relocated, and also preventing any driver in Dom0 to gain control of it, thus avoiding two parties competing for the device. Relocating the BAR outside of the reserved region would be nice, but will likely not resolve the hang. In any event I'm still hoping to have a 3rd view on the situation as a whole, irrespective of specific ideas towards possible workarounds ... Independent of the above a couple of purely cosmetic comments: > @@ -98,3 +99,28 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, > > return rc; > } > + > +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end) > +{ > + /* > + * Check if BAR is not overlapping with any memory region defined > + * in the memory map. > + */ > + if ( is_memory_hole(start, end) ) > + return true; > + > + /* > + * Also allow BARs placed on EfiMemoryMappedIO regions in order to deal > + * with EFI firmware using those regions to place the BARs of devices that > + * can be used during runtime. But print a warning when doing so. > + */ > + if ( !efi_all_runtime_mmio(mfn_to_maddr(start), > + mfn_to_maddr(mfn_add(end, 1))) ) > + return false; > + > + printk(XENLOG_WARNING > + "%pp: BAR [%#" PRI_mfn ", %#" PRI_mfn "] overlaps reserved region\n", > + &pdev->sbdf, mfn_x(start), mfn_x(end)); Perhaps re-word the message now that the check is a different one? > --- a/xen/common/efi/runtime.c > +++ b/xen/common/efi/runtime.c > @@ -78,6 +78,30 @@ bool efi_enabled(unsigned int feature) > return test_bit(feature, &efi_flags); > } > > +/* > + * This function checks if the entire range [start,end) is contained inside of > + * a single EfiMemoryMappedIO descriptor with the runtime attribute set. > + */ > +bool efi_all_runtime_mmio(uint64_t start, uint64_t end) > +{ > + unsigned int i; > + > + for ( i = 0; i < efi_memmap_size; i += efi_mdesc_size ) > + { > + EFI_MEMORY_DESCRIPTOR *desc = efi_memmap + i; const? > + uint64_t len = desc->NumberOfPages << EFI_PAGE_SHIFT; > + > + if ( desc->Type != EfiMemoryMappedIO || > + !(desc->Attribute & EFI_MEMORY_RUNTIME) ) > + continue; > + > + if ( start >= desc->PhysicalStart && end <= desc->PhysicalStart + len ) > + return true; > + } > + > + return false; > +} > + > #ifndef CONFIG_ARM /* TODO - disabled until implemented on ARM */ Perhaps put the function inside this #ifdef? Jan
On Wed, Oct 05, 2022 at 05:42:08PM +0200, Jan Beulich wrote: > On 05.10.2022 16:12, Roger Pau Monné wrote: > > Hm, I have the following diff attached below which is more limited, > > and not so bad I think. Initially I wanted to introduce an > > efi_all_mapped() helper, but that would require exposing EFI_MEMORY_TYPE > > which is quite intrusive. > > > > Let me know if you think the proposal below is better and I will > > formally send a patch with it. > > Hmm, personally I like this slightly better for, as you say, its more > limited effect. Objectively, however, I'm still unconvinced of making > this an EFI special case. How would non-EFI firmware go about > communicating that it is going to access a device at runtime (which, > as said before, I consider a no-go in the first place)? Likely by > putting its BAR range(s) in an E820_RESERVED region. I've never encountered a non-EFI firmware that attempts to access the BAR of a device so far (or at least none that caused problems with Xen I should say), so I would worry about such use-case when we find one. > Plus the MMIO range covered on the system in question is pretty large. > That way we're still allowing pretty wide an abuse by the firmware. > Furthermore the MCFG range would also be covered by an > EfiMemoryMappedIO descriptor (in fact that's the only use of the type > I had been aware of so far). IOW the change specifically permits an > overlap of a BAR with an MCFG range. Additionally I could check if the device overlaps with any known MCFG regions in pci_check_bar(), again not sure if that's more fine grained than needed. > > Who's the manufacturer of the system? It's from Supermicro: "Supermicro X11DPU BIOS" > Or put in different words - how > likely is it that we could first gain understanding on their > intentions with this region? You did say the system hangs hard without > some kind of workaround, but that doesn't clarify to me in how far a > use of the device by the firmware was actually involved there. It's a black box to me, so I have no idea what the firmware attempts to do. It looks like that BAR is used to store some information related to the boot sequence, the hang happened when trying to create a new variable bootnum using efibootmgr (that's just a guess really). > Have you considered other routes towards dealing with the issue? One > approach coming to mind would build on top of what you've been doing > so far (either variant): Besides avoiding the turning off of memory > decode, also invoke pci_ro_device(), thus protecting it from having > its BARs relocated, and also preventing any driver in Dom0 to gain > control of it, thus avoiding two parties competing for the device. IMO using pci_ro_device() would be going too far here - it's not Xen the entity using the device, so Xen doesn't really know whether there's already some interface between the firmware and the OS driver (or just OS) in order to signal that the device is being used by the firmware. > Relocating the BAR outside of the reserved region would be nice, but > will likely not resolve the hang. Not an option for Xen due to not having enough information about the memory layout, and it's risky IMO anyway. > In any event I'm still hoping to have a 3rd view on the situation as a > whole, irrespective of specific ideas towards possible workarounds ... Thanks for the detailed feedback. Roger.
On 06.10.2022 10:48, Roger Pau Monné wrote: > On Wed, Oct 05, 2022 at 05:42:08PM +0200, Jan Beulich wrote: >> On 05.10.2022 16:12, Roger Pau Monné wrote: >>> Hm, I have the following diff attached below which is more limited, >>> and not so bad I think. Initially I wanted to introduce an >>> efi_all_mapped() helper, but that would require exposing EFI_MEMORY_TYPE >>> which is quite intrusive. >>> >>> Let me know if you think the proposal below is better and I will >>> formally send a patch with it. >> >> Hmm, personally I like this slightly better for, as you say, its more >> limited effect. Objectively, however, I'm still unconvinced of making >> this an EFI special case. How would non-EFI firmware go about >> communicating that it is going to access a device at runtime (which, >> as said before, I consider a no-go in the first place)? Likely by >> putting its BAR range(s) in an E820_RESERVED region. > > I've never encountered a non-EFI firmware that attempts to access the > BAR of a device so far (or at least none that caused problems with > Xen I should say), so I would worry about such use-case when we find > one. > >> Plus the MMIO range covered on the system in question is pretty large. >> That way we're still allowing pretty wide an abuse by the firmware. >> Furthermore the MCFG range would also be covered by an >> EfiMemoryMappedIO descriptor (in fact that's the only use of the type >> I had been aware of so far). IOW the change specifically permits an >> overlap of a BAR with an MCFG range. > > Additionally I could check if the device overlaps with any known MCFG > regions in pci_check_bar(), again not sure if that's more fine grained > than needed. > >> >> Who's the manufacturer of the system? > > It's from Supermicro: "Supermicro X11DPU BIOS" > >> Or put in different words - how >> likely is it that we could first gain understanding on their >> intentions with this region? You did say the system hangs hard without >> some kind of workaround, but that doesn't clarify to me in how far a >> use of the device by the firmware was actually involved there. > > It's a black box to me, so I have no idea what the firmware attempts > to do. Right - I don't expect there's a realistic chance of getting hold of a firmware person of theirs. > It looks like that BAR is used to store some information > related to the boot sequence, the hang happened when trying to create > a new variable bootnum using efibootmgr (that's just a guess > really). > >> Have you considered other routes towards dealing with the issue? One >> approach coming to mind would build on top of what you've been doing >> so far (either variant): Besides avoiding the turning off of memory >> decode, also invoke pci_ro_device(), thus protecting it from having >> its BARs relocated, and also preventing any driver in Dom0 to gain >> control of it, thus avoiding two parties competing for the device. > > IMO using pci_ro_device() would be going too far here - it's not Xen > the entity using the device, so Xen doesn't really know whether > there's already some interface between the firmware and the OS driver > (or just OS) in order to signal that the device is being used by the > firmware. Fair point (I did think of that, but ended up not spelling it out). Jan
On Thu, Oct 06, 2022 at 11:02:32AM +0200, Jan Beulich wrote: > On 06.10.2022 10:48, Roger Pau Monné wrote: > > On Wed, Oct 05, 2022 at 05:42:08PM +0200, Jan Beulich wrote: > >> Or put in different words - how > >> likely is it that we could first gain understanding on their > >> intentions with this region? You did say the system hangs hard without > >> some kind of workaround, but that doesn't clarify to me in how far a > >> use of the device by the firmware was actually involved there. > > > > It's a black box to me, so I have no idea what the firmware attempts > > to do. > > Right - I don't expect there's a realistic chance of getting hold of a > firmware person of theirs. Even if somehow I could manage to do that (which I think it's unlikely), it's not going to happen before the release, and there are production boxes out there that have this behavior and we need to deal with them. Thanks, Roger.
diff --git a/xen/arch/arm/include/asm/pci.h b/xen/arch/arm/include/asm/pci.h index 8cb46f6b71..80a2431804 100644 --- a/xen/arch/arm/include/asm/pci.h +++ b/xen/arch/arm/include/asm/pci.h @@ -126,8 +126,6 @@ int pci_host_iterate_bridges_and_count(struct domain *d, int pci_host_bridge_mappings(struct domain *d); -bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end); - #else /*!CONFIG_HAS_PCI*/ struct arch_pci_dev { }; diff --git a/xen/arch/x86/include/asm/pci.h b/xen/arch/x86/include/asm/pci.h index f4a58c8acf..c8e1a9ecdb 100644 --- a/xen/arch/x86/include/asm/pci.h +++ b/xen/arch/x86/include/asm/pci.h @@ -57,14 +57,4 @@ static always_inline bool is_pci_passthrough_enabled(void) void arch_pci_init_pdev(struct pci_dev *pdev); -static inline bool pci_check_bar(const struct pci_dev *pdev, - mfn_t start, mfn_t end) -{ - /* - * Check if BAR is not overlapping with any memory region defined - * in the memory map. - */ - return is_memory_hole(start, end); -} - #endif /* __X86_PCI_H__ */ diff --git a/xen/arch/x86/pci.c b/xen/arch/x86/pci.c index 97b792e578..6920bf2168 100644 --- a/xen/arch/x86/pci.c +++ b/xen/arch/x86/pci.c @@ -98,3 +98,30 @@ int pci_conf_write_intercept(unsigned int seg, unsigned int bdf, return rc; } + +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end) +{ + unsigned long mfn; + + /* + * Check if BAR is not overlapping with any memory region defined + * in the memory map. + */ + if ( is_memory_hole(start, end) ) + return true; + + /* + * Also allow BARs placed on reserved regions in order to deal with EFI + * firmware using EfiMemoryMappedIO regions to place the BARs of devices + * that can be used during runtime. But print a warning when doing so. + */ + for ( mfn = mfn_x(start); mfn <= mfn_x(end); mfn++ ) + if ( !page_is_ram_type(mfn, RAM_TYPE_RESERVED) ) + return false; + + printk(XENLOG_WARNING + "%pp: BAR [%#" PRI_mfn ", %#" PRI_mfn "] overlaps reserved region\n", + &pdev->sbdf, mfn_x(start), mfn_x(end)); + + return true; +} diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 5975ca2f30..64995fc68d 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -211,6 +211,7 @@ unsigned int pci_size_mem_bar(pci_sbdf_t sbdf, unsigned int pos, void pci_intx(const struct pci_dev *, bool enable); bool_t pcie_aer_get_firmware_first(const struct pci_dev *); +bool pci_check_bar(const struct pci_dev *pdev, mfn_t start, mfn_t end); struct pirq; int msixtbl_pt_register(struct domain *, struct pirq *, uint64_t gtable);
The EFI memory map contains two memory types (EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace) used to describe IO memory areas used by EFI firmware. The current parsing of the EFI memory map is translating EfiMemoryMappedIO to E820_RESERVED on x86. This causes issues on some boxes as the firmware is relying on using those regions to position the BARs of devices being used (possibly during runtime) for the firmware. Xen will disable memory decoding on any device that has BARs positioned over any regions on the e820 memory map, hence the firmware will malfunction after Xen turning off memory decoding for the device(s) that have BARs mapped in EfiMemoryMappedIO regions. The system under which this was observed has: EFI memory map: [...] 00000fd000000-00000fe7fffff type=11 attr=800000000000100d [...] 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map The device behind this BAR is: 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) Subsystem: Super Micro Computer Inc Device 091c Flags: fast devsel Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well For the record, the symptom observed in that machine was a hard freeze when attempting to set an EFI variable (XEN_EFI_set_variable). Fix by allowing BARs of PCI devices to be positioned over reserved memory regions, but print a warning message about such overlap. Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/arch/arm/include/asm/pci.h | 2 -- xen/arch/x86/include/asm/pci.h | 10 ---------- xen/arch/x86/pci.c | 27 +++++++++++++++++++++++++++ xen/include/xen/pci.h | 1 + 4 files changed, 28 insertions(+), 12 deletions(-)