Message ID | 20220930141737.67574-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for-4.17] x86/efi: don't translate EFI memory map MMIO regions to e820 RESERVED | expand |
On 30.09.2022 16:17, Roger Pau Monne wrote: > The EFI memory map contains two memory types (EfiMemoryMappedIO and > EfiMemoryMappedIOPortSpace) used to describe IO memory areas of > devices used by EFI. > > The current parsing of the EFI memory map was translating > EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on > x86. This is an issue because device MMIO regions (BARs) should not > be positioned on reserved regions. Any BARs positioned on non-hole > areas of the memory map will cause is_memory_hole() to return false, > which would then cause memory decoding to be disabled for such device. > This leads to EFI firmware malfunctions when using runtime services. > > The system under which this was observed has: > > EFI memory map: > [...] > 00000fd000000-00000fe7fffff type=11 attr=800000000000100d > [...] > 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map > > The device behind this BAR is: > > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) > Subsystem: Super Micro Computer Inc Device 091c > Flags: fast devsel > Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well > > For the record, the symptom observed in that machine was a hard freeze > when attempting to set an EFI variable (XEN_EFI_set_variable). > > Fix by not adding regions with type EfiMemoryMappedIO or > EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to > be positioned there. > > Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> In the best case this is moving us from one way of being wrong to another: So far we wrongly include BARs in E820_RESERVED (_if_ they can be legitimately covered by a EfiMemoryMappedIO region in the first place, which I'm not sure is actually permitted - iirc just like E820_RESERVED may not be used for BARs, this memory type also may not be), whereas with your change we would no longer report non-BAR MMIO space (chipset specific ranges for example) as reserved. In fact I think the example you provide is at least partly due to bogus firmware behavior: The BAR is put in space normally used for firmware specific memory (MMIO) ranges. I think firmware should either assign the BAR differently or exclude the range from the memory map. I guess instead we want to handle the example you give by a firmware quirk workaround. > --- > I don't understand the definition of EfiMemoryMappedIOPortSpace: > > "System memory-mapped IO region that is used to translate memory > cycles to IO cycles by the processor." That's something (only?) IA-64 used, where kind of as a "replacement" for x86 I/O port accesses equivalents thereof were provided (iirc 4 ports per page) via MMIO accesses. It is this compatibility MMIO space which is marked this way. Such ranges should never be seen on (current) x86. > But given its name I would assume it's also likely used to mark ranges > in use by PCI device BARs. > > It would also be interesting to forward this information to dom0, so > it doesn't attempt to move the BARs of this device(s) around, or else > issues will arise. None of this is device specific. There's simply (typically) one MMIO range covering the entire 64k or I/O port space. Jan
On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote: > On 30.09.2022 16:17, Roger Pau Monne wrote: > > The EFI memory map contains two memory types (EfiMemoryMappedIO and > > EfiMemoryMappedIOPortSpace) used to describe IO memory areas of > > devices used by EFI. > > > > The current parsing of the EFI memory map was translating > > EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on > > x86. This is an issue because device MMIO regions (BARs) should not > > be positioned on reserved regions. Any BARs positioned on non-hole > > areas of the memory map will cause is_memory_hole() to return false, > > which would then cause memory decoding to be disabled for such device. > > This leads to EFI firmware malfunctions when using runtime services. > > > > The system under which this was observed has: > > > > EFI memory map: > > [...] > > 00000fd000000-00000fe7fffff type=11 attr=800000000000100d > > [...] > > 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map > > > > The device behind this BAR is: > > > > 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) > > Subsystem: Super Micro Computer Inc Device 091c > > Flags: fast devsel > > Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well > > > > For the record, the symptom observed in that machine was a hard freeze > > when attempting to set an EFI variable (XEN_EFI_set_variable). > > > > Fix by not adding regions with type EfiMemoryMappedIO or > > EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to > > be positioned there. > > > > Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > In the best case this is moving us from one way of being wrong to another: > So far we wrongly include BARs in E820_RESERVED (_if_ they can be > legitimately covered by a EfiMemoryMappedIO region in the first place, > which I'm not sure is actually permitted - iirc just like E820_RESERVED > may not be used for BARs, this memory type also may not be), whereas with > your change we would no longer report non-BAR MMIO space (chipset specific > ranges for example) as reserved. In fact I think the example you provide > is at least partly due to bogus firmware behavior: The BAR is put in space > normally used for firmware specific memory (MMIO) ranges. I think firmware > should either assign the BAR differently or exclude the range from the > memory map. Hm, I'm not sure the example is bogus, how would firmware request a BAR to be mapped for run time services to access it otherwise if it's not using EfiMemoryMappedIO? Not adding the BAR to the memory map in any way would mean the OS is free to not map it for runtime services to access. > I guess instead we want to handle the example you give by a firmware quirk > workaround. I'm unconvinced we need a quirk for this. AFAICT such usage of EfiMemoryMappedIO doesn't go against the UEFI spec, and hence we need to handle it without requiring specific firmware quirks. > > --- > > I don't understand the definition of EfiMemoryMappedIOPortSpace: > > > > "System memory-mapped IO region that is used to translate memory > > cycles to IO cycles by the processor." > > That's something (only?) IA-64 used, where kind of as a "replacement" for > x86 I/O port accesses equivalents thereof were provided (iirc 4 ports > per page) via MMIO accesses. It is this compatibility MMIO space which is > marked this way. Such ranges should never be seen on (current) x86. I've heard the Arm guys speak about something similar. There's a clarification note in newer versions of the UEFI spec: "Note: There is only one region of type EfiMemoryMappedIoPortSpace defined in the architecture for Itanium-based platforms. As a result, there should be one and only one region of type EfiMemoryMappedIoPortSpace in the EFI memory map of an Itanium-based platform." > > But given its name I would assume it's also likely used to mark ranges > > in use by PCI device BARs. > > > > It would also be interesting to forward this information to dom0, so > > it doesn't attempt to move the BARs of this device(s) around, or else > > issues will arise. > > None of this is device specific. There's simply (typically) one MMIO > range covering the entire 64k or I/O port space. So this translation region won't be in a BAR of a host bridge for example? Thanks, Roger.
On 04.10.2022 11:27, Roger Pau Monné wrote: > On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote: >> On 30.09.2022 16:17, Roger Pau Monne wrote: >>> The EFI memory map contains two memory types (EfiMemoryMappedIO and >>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of >>> devices used by EFI. >>> >>> The current parsing of the EFI memory map was translating >>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on >>> x86. This is an issue because device MMIO regions (BARs) should not >>> be positioned on reserved regions. Any BARs positioned on non-hole >>> areas of the memory map will cause is_memory_hole() to return false, >>> which would then cause memory decoding to be disabled for such device. >>> This leads to EFI firmware malfunctions when using runtime services. >>> >>> The system under which this was observed has: >>> >>> EFI memory map: >>> [...] >>> 00000fd000000-00000fe7fffff type=11 attr=800000000000100d >>> [...] >>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map >>> >>> The device behind this BAR is: >>> >>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) >>> Subsystem: Super Micro Computer Inc Device 091c >>> Flags: fast devsel >>> Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well >>> >>> For the record, the symptom observed in that machine was a hard freeze >>> when attempting to set an EFI variable (XEN_EFI_set_variable). >>> >>> Fix by not adding regions with type EfiMemoryMappedIO or >>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to >>> be positioned there. >>> >>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >> >> In the best case this is moving us from one way of being wrong to another: >> So far we wrongly include BARs in E820_RESERVED (_if_ they can be >> legitimately covered by a EfiMemoryMappedIO region in the first place, >> which I'm not sure is actually permitted - iirc just like E820_RESERVED >> may not be used for BARs, this memory type also may not be), whereas with >> your change we would no longer report non-BAR MMIO space (chipset specific >> ranges for example) as reserved. In fact I think the example you provide >> is at least partly due to bogus firmware behavior: The BAR is put in space >> normally used for firmware specific memory (MMIO) ranges. I think firmware >> should either assign the BAR differently or exclude the range from the >> memory map. > > Hm, I'm not sure the example is bogus, how would firmware request a BAR > to be mapped for run time services to access it otherwise if it's not > using EfiMemoryMappedIO? > > Not adding the BAR to the memory map in any way would mean the OS is > free to not map it for runtime services to access. My view is that BARs should not be marked for runtime services use. Doing so requires awareness of the driver inside the OS, which I don't think one can expect. If firmware needs to make use of a device in a system, it ought to properly hide it from the OS. Note how the potential sharing of an RTC requires special provisions in the spec, mandating driver awareness. Having a BAR expressed in the memory map also contradicts the ability of an OS to relocate all BARs of all devices, if necessary. >> I guess instead we want to handle the example you give by a firmware quirk >> workaround. > > I'm unconvinced we need a quirk for this. AFAICT such usage of > EfiMemoryMappedIO doesn't go against the UEFI spec, and hence we need > to handle it without requiring specific firmware quirks. > >>> --- >>> I don't understand the definition of EfiMemoryMappedIOPortSpace: >>> >>> "System memory-mapped IO region that is used to translate memory >>> cycles to IO cycles by the processor." >> >> That's something (only?) IA-64 used, where kind of as a "replacement" for >> x86 I/O port accesses equivalents thereof were provided (iirc 4 ports >> per page) via MMIO accesses. It is this compatibility MMIO space which is >> marked this way. Such ranges should never be seen on (current) x86. > > I've heard the Arm guys speak about something similar. > > There's a clarification note in newer versions of the UEFI spec: > > "Note: There is only one region of type EfiMemoryMappedIoPortSpace > defined in the architecture for Itanium-based platforms. As a result, > there should be one and only one region of type > EfiMemoryMappedIoPortSpace in the EFI memory map of an Itanium-based > platform." > >>> But given its name I would assume it's also likely used to mark ranges >>> in use by PCI device BARs. >>> >>> It would also be interesting to forward this information to dom0, so >>> it doesn't attempt to move the BARs of this device(s) around, or else >>> issues will arise. >> >> None of this is device specific. There's simply (typically) one MMIO >> range covering the entire 64k or I/O port space. > > So this translation region won't be in a BAR of a host bridge for > example? I have to admit that I don't recall at which layer the conversion happens. I also didn't think (host) bridges would typically have BARs. Bridges (but iirc not host bridges) have bridge windows, but that's different. Jan
On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote: > On 04.10.2022 11:27, Roger Pau Monné wrote: > > On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote: > >> On 30.09.2022 16:17, Roger Pau Monne wrote: > >>> The EFI memory map contains two memory types (EfiMemoryMappedIO and > >>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of > >>> devices used by EFI. > >>> > >>> The current parsing of the EFI memory map was translating > >>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on > >>> x86. This is an issue because device MMIO regions (BARs) should not > >>> be positioned on reserved regions. Any BARs positioned on non-hole > >>> areas of the memory map will cause is_memory_hole() to return false, > >>> which would then cause memory decoding to be disabled for such device. > >>> This leads to EFI firmware malfunctions when using runtime services. > >>> > >>> The system under which this was observed has: > >>> > >>> EFI memory map: > >>> [...] > >>> 00000fd000000-00000fe7fffff type=11 attr=800000000000100d > >>> [...] > >>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map > >>> > >>> The device behind this BAR is: > >>> > >>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) > >>> Subsystem: Super Micro Computer Inc Device 091c > >>> Flags: fast devsel > >>> Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well > >>> > >>> For the record, the symptom observed in that machine was a hard freeze > >>> when attempting to set an EFI variable (XEN_EFI_set_variable). > >>> > >>> Fix by not adding regions with type EfiMemoryMappedIO or > >>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to > >>> be positioned there. > >>> > >>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') > >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > >> > >> In the best case this is moving us from one way of being wrong to another: > >> So far we wrongly include BARs in E820_RESERVED (_if_ they can be > >> legitimately covered by a EfiMemoryMappedIO region in the first place, > >> which I'm not sure is actually permitted - iirc just like E820_RESERVED > >> may not be used for BARs, this memory type also may not be), whereas with > >> your change we would no longer report non-BAR MMIO space (chipset specific > >> ranges for example) as reserved. In fact I think the example you provide > >> is at least partly due to bogus firmware behavior: The BAR is put in space > >> normally used for firmware specific memory (MMIO) ranges. I think firmware > >> should either assign the BAR differently or exclude the range from the > >> memory map. > > > > Hm, I'm not sure the example is bogus, how would firmware request a BAR > > to be mapped for run time services to access it otherwise if it's not > > using EfiMemoryMappedIO? > > > > Not adding the BAR to the memory map in any way would mean the OS is > > free to not map it for runtime services to access. > > My view is that BARs should not be marked for runtime services use. Doing > so requires awareness of the driver inside the OS, which I don't think > one can expect. If firmware needs to make use of a device in a system, it > ought to properly hide it from the OS. Note how the potential sharing of > an RTC requires special provisions in the spec, mandating driver awareness. > > Having a BAR expressed in the memory map also contradicts the ability of > an OS to relocate all BARs of all devices, if necessary. I've failed to figure out if there's a way in UEFI to report a device is in use by the firmware. I've already looked before sending the patch (see also the post commit notes about for example not passing through the device to any guest for obvious reason). I've got no idea if Linux has any checks to avoid trying to move BARs residing in EfiMemoryMappedIO ranges, we have now observed this behavior in two systems already. Maybe we could do a special check for PCI devices and allow them having BARs in EfiMemoryMappedIO, together with printing a warning message. Thanks, Roger.
On 04.10.2022 14:17, Roger Pau Monné wrote: > On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote: >> On 04.10.2022 11:27, Roger Pau Monné wrote: >>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote: >>>> On 30.09.2022 16:17, Roger Pau Monne wrote: >>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and >>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of >>>>> devices used by EFI. >>>>> >>>>> The current parsing of the EFI memory map was translating >>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on >>>>> x86. This is an issue because device MMIO regions (BARs) should not >>>>> be positioned on reserved regions. Any BARs positioned on non-hole >>>>> areas of the memory map will cause is_memory_hole() to return false, >>>>> which would then cause memory decoding to be disabled for such device. >>>>> This leads to EFI firmware malfunctions when using runtime services. >>>>> >>>>> The system under which this was observed has: >>>>> >>>>> EFI memory map: >>>>> [...] >>>>> 00000fd000000-00000fe7fffff type=11 attr=800000000000100d >>>>> [...] >>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map >>>>> >>>>> The device behind this BAR is: >>>>> >>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) >>>>> Subsystem: Super Micro Computer Inc Device 091c >>>>> Flags: fast devsel >>>>> Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well >>>>> >>>>> For the record, the symptom observed in that machine was a hard freeze >>>>> when attempting to set an EFI variable (XEN_EFI_set_variable). >>>>> >>>>> Fix by not adding regions with type EfiMemoryMappedIO or >>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to >>>>> be positioned there. >>>>> >>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') >>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>>> >>>> In the best case this is moving us from one way of being wrong to another: >>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be >>>> legitimately covered by a EfiMemoryMappedIO region in the first place, >>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED >>>> may not be used for BARs, this memory type also may not be), whereas with >>>> your change we would no longer report non-BAR MMIO space (chipset specific >>>> ranges for example) as reserved. In fact I think the example you provide >>>> is at least partly due to bogus firmware behavior: The BAR is put in space >>>> normally used for firmware specific memory (MMIO) ranges. I think firmware >>>> should either assign the BAR differently or exclude the range from the >>>> memory map. >>> >>> Hm, I'm not sure the example is bogus, how would firmware request a BAR >>> to be mapped for run time services to access it otherwise if it's not >>> using EfiMemoryMappedIO? >>> >>> Not adding the BAR to the memory map in any way would mean the OS is >>> free to not map it for runtime services to access. >> >> My view is that BARs should not be marked for runtime services use. Doing >> so requires awareness of the driver inside the OS, which I don't think >> one can expect. If firmware needs to make use of a device in a system, it >> ought to properly hide it from the OS. Note how the potential sharing of >> an RTC requires special provisions in the spec, mandating driver awareness. >> >> Having a BAR expressed in the memory map also contradicts the ability of >> an OS to relocate all BARs of all devices, if necessary. > > I've failed to figure out if there's a way in UEFI to report a device > is in use by the firmware. I've already looked before sending the > patch (see also the post commit notes about for example not passing > through the device to any guest for obvious reason). > > I've got no idea if Linux has any checks to avoid trying to move BARs > residing in EfiMemoryMappedIO ranges, we have now observed this > behavior in two systems already. > > Maybe we could do a special check for PCI devices and allow them > having BARs in EfiMemoryMappedIO, together with printing a warning > message. Right, that's one of the possible quirk workarounds I was thinking of. At the risk of stating the obvious - the same would presumably apply to E820_RESERVED on non-EFI systems then. Jan
On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote: > On 04.10.2022 14:17, Roger Pau Monné wrote: > > On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote: > >> On 04.10.2022 11:27, Roger Pau Monné wrote: > >>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote: > >>>> On 30.09.2022 16:17, Roger Pau Monne wrote: > >>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and > >>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of > >>>>> devices used by EFI. > >>>>> > >>>>> The current parsing of the EFI memory map was translating > >>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on > >>>>> x86. This is an issue because device MMIO regions (BARs) should not > >>>>> be positioned on reserved regions. Any BARs positioned on non-hole > >>>>> areas of the memory map will cause is_memory_hole() to return false, > >>>>> which would then cause memory decoding to be disabled for such device. > >>>>> This leads to EFI firmware malfunctions when using runtime services. > >>>>> > >>>>> The system under which this was observed has: > >>>>> > >>>>> EFI memory map: > >>>>> [...] > >>>>> 00000fd000000-00000fe7fffff type=11 attr=800000000000100d > >>>>> [...] > >>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map > >>>>> > >>>>> The device behind this BAR is: > >>>>> > >>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) > >>>>> Subsystem: Super Micro Computer Inc Device 091c > >>>>> Flags: fast devsel > >>>>> Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well > >>>>> > >>>>> For the record, the symptom observed in that machine was a hard freeze > >>>>> when attempting to set an EFI variable (XEN_EFI_set_variable). > >>>>> > >>>>> Fix by not adding regions with type EfiMemoryMappedIO or > >>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to > >>>>> be positioned there. > >>>>> > >>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') > >>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > >>>> > >>>> In the best case this is moving us from one way of being wrong to another: > >>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be > >>>> legitimately covered by a EfiMemoryMappedIO region in the first place, > >>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED > >>>> may not be used for BARs, this memory type also may not be), whereas with > >>>> your change we would no longer report non-BAR MMIO space (chipset specific > >>>> ranges for example) as reserved. In fact I think the example you provide > >>>> is at least partly due to bogus firmware behavior: The BAR is put in space > >>>> normally used for firmware specific memory (MMIO) ranges. I think firmware > >>>> should either assign the BAR differently or exclude the range from the > >>>> memory map. > >>> > >>> Hm, I'm not sure the example is bogus, how would firmware request a BAR > >>> to be mapped for run time services to access it otherwise if it's not > >>> using EfiMemoryMappedIO? > >>> > >>> Not adding the BAR to the memory map in any way would mean the OS is > >>> free to not map it for runtime services to access. > >> > >> My view is that BARs should not be marked for runtime services use. Doing > >> so requires awareness of the driver inside the OS, which I don't think > >> one can expect. If firmware needs to make use of a device in a system, it > >> ought to properly hide it from the OS. Note how the potential sharing of > >> an RTC requires special provisions in the spec, mandating driver awareness. > >> > >> Having a BAR expressed in the memory map also contradicts the ability of > >> an OS to relocate all BARs of all devices, if necessary. > > > > I've failed to figure out if there's a way in UEFI to report a device > > is in use by the firmware. I've already looked before sending the > > patch (see also the post commit notes about for example not passing > > through the device to any guest for obvious reason). > > > > I've got no idea if Linux has any checks to avoid trying to move BARs > > residing in EfiMemoryMappedIO ranges, we have now observed this > > behavior in two systems already. > > > > Maybe we could do a special check for PCI devices and allow them > > having BARs in EfiMemoryMappedIO, together with printing a warning > > message. > > Right, that's one of the possible quirk workarounds I was thinking of. > At the risk of stating the obvious - the same would presumably apply to > E820_RESERVED on non-EFI systems then. One option would be to strictly limit to EfiMemoryMappedIO, by taking the EFI memory map into account also if present. Another maybe simpler option is to allow BARs to be placed in E820_RESERVED regions, and translate EfiMemoryMappedIO into E820_RESERVED like we have been doing. I will attempt the later if you are OK with the approach. Thanks, Roger.
On 04.10.2022 14:59, Roger Pau Monné wrote: > On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote: >> On 04.10.2022 14:17, Roger Pau Monné wrote: >>> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote: >>>> On 04.10.2022 11:27, Roger Pau Monné wrote: >>>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote: >>>>>> On 30.09.2022 16:17, Roger Pau Monne wrote: >>>>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and >>>>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of >>>>>>> devices used by EFI. >>>>>>> >>>>>>> The current parsing of the EFI memory map was translating >>>>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on >>>>>>> x86. This is an issue because device MMIO regions (BARs) should not >>>>>>> be positioned on reserved regions. Any BARs positioned on non-hole >>>>>>> areas of the memory map will cause is_memory_hole() to return false, >>>>>>> which would then cause memory decoding to be disabled for such device. >>>>>>> This leads to EFI firmware malfunctions when using runtime services. >>>>>>> >>>>>>> The system under which this was observed has: >>>>>>> >>>>>>> EFI memory map: >>>>>>> [...] >>>>>>> 00000fd000000-00000fe7fffff type=11 attr=800000000000100d >>>>>>> [...] >>>>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map >>>>>>> >>>>>>> The device behind this BAR is: >>>>>>> >>>>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) >>>>>>> Subsystem: Super Micro Computer Inc Device 091c >>>>>>> Flags: fast devsel >>>>>>> Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well >>>>>>> >>>>>>> For the record, the symptom observed in that machine was a hard freeze >>>>>>> when attempting to set an EFI variable (XEN_EFI_set_variable). >>>>>>> >>>>>>> Fix by not adding regions with type EfiMemoryMappedIO or >>>>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to >>>>>>> be positioned there. >>>>>>> >>>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') >>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>>>>> >>>>>> In the best case this is moving us from one way of being wrong to another: >>>>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be >>>>>> legitimately covered by a EfiMemoryMappedIO region in the first place, >>>>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED >>>>>> may not be used for BARs, this memory type also may not be), whereas with >>>>>> your change we would no longer report non-BAR MMIO space (chipset specific >>>>>> ranges for example) as reserved. In fact I think the example you provide >>>>>> is at least partly due to bogus firmware behavior: The BAR is put in space >>>>>> normally used for firmware specific memory (MMIO) ranges. I think firmware >>>>>> should either assign the BAR differently or exclude the range from the >>>>>> memory map. >>>>> >>>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR >>>>> to be mapped for run time services to access it otherwise if it's not >>>>> using EfiMemoryMappedIO? >>>>> >>>>> Not adding the BAR to the memory map in any way would mean the OS is >>>>> free to not map it for runtime services to access. >>>> >>>> My view is that BARs should not be marked for runtime services use. Doing >>>> so requires awareness of the driver inside the OS, which I don't think >>>> one can expect. If firmware needs to make use of a device in a system, it >>>> ought to properly hide it from the OS. Note how the potential sharing of >>>> an RTC requires special provisions in the spec, mandating driver awareness. >>>> >>>> Having a BAR expressed in the memory map also contradicts the ability of >>>> an OS to relocate all BARs of all devices, if necessary. >>> >>> I've failed to figure out if there's a way in UEFI to report a device >>> is in use by the firmware. I've already looked before sending the >>> patch (see also the post commit notes about for example not passing >>> through the device to any guest for obvious reason). >>> >>> I've got no idea if Linux has any checks to avoid trying to move BARs >>> residing in EfiMemoryMappedIO ranges, we have now observed this >>> behavior in two systems already. >>> >>> Maybe we could do a special check for PCI devices and allow them >>> having BARs in EfiMemoryMappedIO, together with printing a warning >>> message. >> >> Right, that's one of the possible quirk workarounds I was thinking of. >> At the risk of stating the obvious - the same would presumably apply to >> E820_RESERVED on non-EFI systems then. > > One option would be to strictly limit to EfiMemoryMappedIO, by taking > the EFI memory map into account also if present. > > Another maybe simpler option is to allow BARs to be placed in > E820_RESERVED regions, and translate EfiMemoryMappedIO into > E820_RESERVED like we have been doing. > > I will attempt the later if you are OK with the approach. I might be okay with the approach, but first of all I continue to be worried of us violating specifications if we make this the default behavior. Jan
On Tue, Oct 04, 2022 at 03:15:02PM +0200, Jan Beulich wrote: > On 04.10.2022 14:59, Roger Pau Monné wrote: > > On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote: > >> On 04.10.2022 14:17, Roger Pau Monné wrote: > >>> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote: > >>>> On 04.10.2022 11:27, Roger Pau Monné wrote: > >>>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote: > >>>>>> On 30.09.2022 16:17, Roger Pau Monne wrote: > >>>>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and > >>>>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of > >>>>>>> devices used by EFI. > >>>>>>> > >>>>>>> The current parsing of the EFI memory map was translating > >>>>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on > >>>>>>> x86. This is an issue because device MMIO regions (BARs) should not > >>>>>>> be positioned on reserved regions. Any BARs positioned on non-hole > >>>>>>> areas of the memory map will cause is_memory_hole() to return false, > >>>>>>> which would then cause memory decoding to be disabled for such device. > >>>>>>> This leads to EFI firmware malfunctions when using runtime services. > >>>>>>> > >>>>>>> The system under which this was observed has: > >>>>>>> > >>>>>>> EFI memory map: > >>>>>>> [...] > >>>>>>> 00000fd000000-00000fe7fffff type=11 attr=800000000000100d > >>>>>>> [...] > >>>>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map > >>>>>>> > >>>>>>> The device behind this BAR is: > >>>>>>> > >>>>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) > >>>>>>> Subsystem: Super Micro Computer Inc Device 091c > >>>>>>> Flags: fast devsel > >>>>>>> Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well > >>>>>>> > >>>>>>> For the record, the symptom observed in that machine was a hard freeze > >>>>>>> when attempting to set an EFI variable (XEN_EFI_set_variable). > >>>>>>> > >>>>>>> Fix by not adding regions with type EfiMemoryMappedIO or > >>>>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to > >>>>>>> be positioned there. > >>>>>>> > >>>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') > >>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > >>>>>> > >>>>>> In the best case this is moving us from one way of being wrong to another: > >>>>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be > >>>>>> legitimately covered by a EfiMemoryMappedIO region in the first place, > >>>>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED > >>>>>> may not be used for BARs, this memory type also may not be), whereas with > >>>>>> your change we would no longer report non-BAR MMIO space (chipset specific > >>>>>> ranges for example) as reserved. In fact I think the example you provide > >>>>>> is at least partly due to bogus firmware behavior: The BAR is put in space > >>>>>> normally used for firmware specific memory (MMIO) ranges. I think firmware > >>>>>> should either assign the BAR differently or exclude the range from the > >>>>>> memory map. > >>>>> > >>>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR > >>>>> to be mapped for run time services to access it otherwise if it's not > >>>>> using EfiMemoryMappedIO? > >>>>> > >>>>> Not adding the BAR to the memory map in any way would mean the OS is > >>>>> free to not map it for runtime services to access. > >>>> > >>>> My view is that BARs should not be marked for runtime services use. Doing > >>>> so requires awareness of the driver inside the OS, which I don't think > >>>> one can expect. If firmware needs to make use of a device in a system, it > >>>> ought to properly hide it from the OS. Note how the potential sharing of > >>>> an RTC requires special provisions in the spec, mandating driver awareness. > >>>> > >>>> Having a BAR expressed in the memory map also contradicts the ability of > >>>> an OS to relocate all BARs of all devices, if necessary. > >>> > >>> I've failed to figure out if there's a way in UEFI to report a device > >>> is in use by the firmware. I've already looked before sending the > >>> patch (see also the post commit notes about for example not passing > >>> through the device to any guest for obvious reason). > >>> > >>> I've got no idea if Linux has any checks to avoid trying to move BARs > >>> residing in EfiMemoryMappedIO ranges, we have now observed this > >>> behavior in two systems already. > >>> > >>> Maybe we could do a special check for PCI devices and allow them > >>> having BARs in EfiMemoryMappedIO, together with printing a warning > >>> message. > >> > >> Right, that's one of the possible quirk workarounds I was thinking of. > >> At the risk of stating the obvious - the same would presumably apply to > >> E820_RESERVED on non-EFI systems then. > > > > One option would be to strictly limit to EfiMemoryMappedIO, by taking > > the EFI memory map into account also if present. > > > > Another maybe simpler option is to allow BARs to be placed in > > E820_RESERVED regions, and translate EfiMemoryMappedIO into > > E820_RESERVED like we have been doing. > > > > I will attempt the later if you are OK with the approach. > > I might be okay with the approach, but first of all I continue to be > worried of us violating specifications if we make this the default > behavior. In any case it would be the firmware violating the specification by not hiding those PCI devices, Xen is just trying to cope. Xen went from not checking the position of the BARs at all to enforcing them to not be placed over any regions on the memory map. I think we need to relax the checks a bit to match reality. Thanks, Roger.
On 04.10.2022 15:55, Roger Pau Monné wrote: > On Tue, Oct 04, 2022 at 03:15:02PM +0200, Jan Beulich wrote: >> On 04.10.2022 14:59, Roger Pau Monné wrote: >>> On Tue, Oct 04, 2022 at 02:21:20PM +0200, Jan Beulich wrote: >>>> On 04.10.2022 14:17, Roger Pau Monné wrote: >>>>> On Tue, Oct 04, 2022 at 12:40:10PM +0200, Jan Beulich wrote: >>>>>> On 04.10.2022 11:27, Roger Pau Monné wrote: >>>>>>> On Tue, Oct 04, 2022 at 11:01:18AM +0200, Jan Beulich wrote: >>>>>>>> On 30.09.2022 16:17, Roger Pau Monne wrote: >>>>>>>>> The EFI memory map contains two memory types (EfiMemoryMappedIO and >>>>>>>>> EfiMemoryMappedIOPortSpace) used to describe IO memory areas of >>>>>>>>> devices used by EFI. >>>>>>>>> >>>>>>>>> The current parsing of the EFI memory map was translating >>>>>>>>> EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on >>>>>>>>> x86. This is an issue because device MMIO regions (BARs) should not >>>>>>>>> be positioned on reserved regions. Any BARs positioned on non-hole >>>>>>>>> areas of the memory map will cause is_memory_hole() to return false, >>>>>>>>> which would then cause memory decoding to be disabled for such device. >>>>>>>>> This leads to EFI firmware malfunctions when using runtime services. >>>>>>>>> >>>>>>>>> The system under which this was observed has: >>>>>>>>> >>>>>>>>> EFI memory map: >>>>>>>>> [...] >>>>>>>>> 00000fd000000-00000fe7fffff type=11 attr=800000000000100d >>>>>>>>> [...] >>>>>>>>> 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map >>>>>>>>> >>>>>>>>> The device behind this BAR is: >>>>>>>>> >>>>>>>>> 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) >>>>>>>>> Subsystem: Super Micro Computer Inc Device 091c >>>>>>>>> Flags: fast devsel >>>>>>>>> Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well >>>>>>>>> >>>>>>>>> For the record, the symptom observed in that machine was a hard freeze >>>>>>>>> when attempting to set an EFI variable (XEN_EFI_set_variable). >>>>>>>>> >>>>>>>>> Fix by not adding regions with type EfiMemoryMappedIO or >>>>>>>>> EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to >>>>>>>>> be positioned there. >>>>>>>>> >>>>>>>>> Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') >>>>>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>>>>>>> >>>>>>>> In the best case this is moving us from one way of being wrong to another: >>>>>>>> So far we wrongly include BARs in E820_RESERVED (_if_ they can be >>>>>>>> legitimately covered by a EfiMemoryMappedIO region in the first place, >>>>>>>> which I'm not sure is actually permitted - iirc just like E820_RESERVED >>>>>>>> may not be used for BARs, this memory type also may not be), whereas with >>>>>>>> your change we would no longer report non-BAR MMIO space (chipset specific >>>>>>>> ranges for example) as reserved. In fact I think the example you provide >>>>>>>> is at least partly due to bogus firmware behavior: The BAR is put in space >>>>>>>> normally used for firmware specific memory (MMIO) ranges. I think firmware >>>>>>>> should either assign the BAR differently or exclude the range from the >>>>>>>> memory map. >>>>>>> >>>>>>> Hm, I'm not sure the example is bogus, how would firmware request a BAR >>>>>>> to be mapped for run time services to access it otherwise if it's not >>>>>>> using EfiMemoryMappedIO? >>>>>>> >>>>>>> Not adding the BAR to the memory map in any way would mean the OS is >>>>>>> free to not map it for runtime services to access. >>>>>> >>>>>> My view is that BARs should not be marked for runtime services use. Doing >>>>>> so requires awareness of the driver inside the OS, which I don't think >>>>>> one can expect. If firmware needs to make use of a device in a system, it >>>>>> ought to properly hide it from the OS. Note how the potential sharing of >>>>>> an RTC requires special provisions in the spec, mandating driver awareness. >>>>>> >>>>>> Having a BAR expressed in the memory map also contradicts the ability of >>>>>> an OS to relocate all BARs of all devices, if necessary. >>>>> >>>>> I've failed to figure out if there's a way in UEFI to report a device >>>>> is in use by the firmware. I've already looked before sending the >>>>> patch (see also the post commit notes about for example not passing >>>>> through the device to any guest for obvious reason). >>>>> >>>>> I've got no idea if Linux has any checks to avoid trying to move BARs >>>>> residing in EfiMemoryMappedIO ranges, we have now observed this >>>>> behavior in two systems already. >>>>> >>>>> Maybe we could do a special check for PCI devices and allow them >>>>> having BARs in EfiMemoryMappedIO, together with printing a warning >>>>> message. >>>> >>>> Right, that's one of the possible quirk workarounds I was thinking of. >>>> At the risk of stating the obvious - the same would presumably apply to >>>> E820_RESERVED on non-EFI systems then. >>> >>> One option would be to strictly limit to EfiMemoryMappedIO, by taking >>> the EFI memory map into account also if present. >>> >>> Another maybe simpler option is to allow BARs to be placed in >>> E820_RESERVED regions, and translate EfiMemoryMappedIO into >>> E820_RESERVED like we have been doing. >>> >>> I will attempt the later if you are OK with the approach. >> >> I might be okay with the approach, but first of all I continue to be >> worried of us violating specifications if we make this the default >> behavior. > > In any case it would be the firmware violating the specification by > not hiding those PCI devices, Xen is just trying to cope. > > Xen went from not checking the position of the BARs at all to > enforcing them to not be placed over any regions on the memory map. I > think we need to relax the checks a bit to match reality. Sure, as a workaround. Yet we don't want to relax too much, or else a primary purpose of the check will be lost. Jan
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h index 836e8c2ba1..12ad94cd71 100644 --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -169,6 +169,14 @@ static void __init efi_arch_process_memory_map(EFI_SYSTEM_TABLE *SystemTable, switch ( desc->Type ) { + case EfiMemoryMappedIO: + case EfiMemoryMappedIOPortSpace: + /* + * There no suitable e820 memory type to represent a MMIO area + * except a hole. + */ + continue; + case EfiBootServicesCode: case EfiBootServicesData: if ( map_bs )
The EFI memory map contains two memory types (EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace) used to describe IO memory areas of devices used by EFI. The current parsing of the EFI memory map was translating EfiMemoryMappedIO and EfiMemoryMappedIOPortSpace to E820_RESERVED on x86. This is an issue because device MMIO regions (BARs) should not be positioned on reserved regions. Any BARs positioned on non-hole areas of the memory map will cause is_memory_hole() to return false, which would then cause memory decoding to be disabled for such device. This leads to EFI firmware malfunctions when using runtime services. The system under which this was observed has: EFI memory map: [...] 00000fd000000-00000fe7fffff type=11 attr=800000000000100d [...] 0000:00:1f.5 disabled: BAR [0xfe010, 0xfe010] overlaps with memory map The device behind this BAR is: 00:1f.5 Serial bus controller [0c80]: Intel Corporation Lewisburg SPI Controller (rev 09) Subsystem: Super Micro Computer Inc Device 091c Flags: fast devsel Memory at fe010000 (32-bit, non-prefetchable) [size=4K]well For the record, the symptom observed in that machine was a hard freeze when attempting to set an EFI variable (XEN_EFI_set_variable). Fix by not adding regions with type EfiMemoryMappedIO or EfiMemoryMappedIOPortSpace to the e820 memory map, that allows BARs to be positioned there. Fixes: 75cc460a1b ('xen/pci: detect when BARs are not suitably positioned') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- I don't understand the definition of EfiMemoryMappedIOPortSpace: "System memory-mapped IO region that is used to translate memory cycles to IO cycles by the processor." But given its name I would assume it's also likely used to mark ranges in use by PCI device BARs. It would also be interesting to forward this information to dom0, so it doesn't attempt to move the BARs of this device(s) around, or else issues will arise. And of course the device must not be passed through to domUs, as disabling memory decoding on it can lead to a host DoS. Not that it makes much sense to pass devices like the one above to guests. It also makes me wonder whether this playing with the decoding bit that we do in Xen is safe. It might be more resilient to only disable memory decoding when the BARs overlap a RAM region, as that would indeed cause issues. We should also consider moving away from the e820 and instead using the EFI memory map for things like is_memory_hole(), but that would involve translating e820 memory maps into EFI format, which might be easier and more reliable than the other way around which we currently do. --- xen/arch/x86/efi/efi-boot.h | 8 ++++++++ 1 file changed, 8 insertions(+)