Message ID | 20250217141602.64014-2-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/dom0: be less restrictive with the Interrupt Address Range | expand |
On 17.02.2025 15:16, Roger Pau Monne wrote: > The current code in arch_iommu_hwdom_init() kind of open-codes the same > MMIO permission ranges that are added to the hardware domain ->iomem_caps. > Avoid this duplication and use ->iomem_caps in arch_iommu_hwdom_init() to > filter which memory regions should be added to the dom0 IOMMU page-tables. > > This should result in no change in the memory regions that end up identity > mapped in the domain IOMMU page tables. > > Note the IO-APIC and MCFG page(s) must be set as not accessible for a PVH > dom0, otherwise the internal Xen emulation for those ranges won't work. > This requires an adjustment in dom0_setup_permissions(). > > Also the special casing of E820_UNUSABLE regions no longer needs to be done > in arch_iommu_hwdom_init(), as those regions are already blocked in > ->iomem_caps and thus would be removed from the rangeset as part of > ->iomem_caps processing in arch_iommu_hwdom_init(). Only almost: ->iomem_caps has them removed only for addresses 1Mb and up. > --- a/xen/arch/x86/dom0_build.c > +++ b/xen/arch/x86/dom0_build.c > @@ -552,7 +552,8 @@ int __init dom0_setup_permissions(struct domain *d) > for ( i = 0; i < nr_ioapics; i++ ) > { > mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr); > - if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) > + if ( is_hvm_domain(d) || > + !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) > rc |= iomem_deny_access(d, mfn, mfn); > } The original code used has_vioapic() and had a comment. At least one of# the two would better be transplanted here, I think. > @@ -593,6 +594,22 @@ int __init dom0_setup_permissions(struct domain *d) > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > } > > + /* For PVH dom0 prevent access to MCFG, it's emulated by Xen. */ > + if ( is_hvm_domain(d) ) > + { > + for ( i = 0; i < pci_mmcfg_config_num; i++ ) > + { > + const unsigned long s = > + PFN_DOWN(pci_mmcfg_config[i].address) + > + PCI_BDF(pci_mmcfg_config[i].start_bus_number, 0, 0); > + const unsigned long e = > + PFN_DOWN(pci_mmcfg_config[i].address) + > + PCI_BDF(pci_mmcfg_config[i].end_bus_number, ~0, ~0); > + > + rc |= iomem_deny_access(d, s, e); > + } > + } Similarly here, the original code used has_vpci() and also had a comment. Is there actually a strong reason to replace the original MCFG code? > --- a/xen/drivers/passthrough/x86/iommu.c > +++ b/xen/drivers/passthrough/x86/iommu.c > @@ -320,6 +320,26 @@ static int __hwdom_init cf_check map_subtract(unsigned long s, unsigned long e, > return rangeset_remove_range(map, s, e); > } > > +struct handle_iomemcap { > + struct rangeset *r; > + unsigned long last; > +}; > +static int __hwdom_init cf_check map_subtract_iomemcap(unsigned long s, > + unsigned long e, > + void *data) > +{ > + struct handle_iomemcap *h = data; > + int rc = 0; > + > + if ( h->last != s ) > + rc = rangeset_remove_range(h->r, h->last, s - 1); > + > + /* Be careful with overflows. */ > + h->last = max(e + 1, e); What overflow could occur here? Addresses are limited to 52 bits; frame numbers are limited to 40 bits. And ->iomem_caps starts out with a range ending at the last address permitted by paddr_bits. > @@ -475,22 +488,13 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d) > if ( rc ) > panic("IOMMU failed to remove Xen ranges: %d\n", rc); > > - /* Remove any overlap with the Interrupt Address Range. */ > - rc = rangeset_remove_range(map, 0xfee00, 0xfeeff); > + iomem.r = map; > + rc = rangeset_report_ranges(d->iomem_caps, 0, ~0UL, map_subtract_iomemcap, > + &iomem); > + if ( !rc && iomem.last < ~0UL ) > + rc = rangeset_remove_range(map, iomem.last, ~0UL); Similarly here I'm wondering about the upper bound of ~0UL. Is this just to be on the safe side? And/or just because it's simpler than calculating the actual upper bound? No ranges above the system physical address width should ever be entered into the rangeset. Kind of as an implication iomem.last also is guaranteed to be below ~0UL when making it here. Jan
On Tue, Feb 18, 2025 at 04:32:33PM +0100, Jan Beulich wrote: > On 17.02.2025 15:16, Roger Pau Monne wrote: > > The current code in arch_iommu_hwdom_init() kind of open-codes the same > > MMIO permission ranges that are added to the hardware domain ->iomem_caps. > > Avoid this duplication and use ->iomem_caps in arch_iommu_hwdom_init() to > > filter which memory regions should be added to the dom0 IOMMU page-tables. > > > > This should result in no change in the memory regions that end up identity > > mapped in the domain IOMMU page tables. > > > > Note the IO-APIC and MCFG page(s) must be set as not accessible for a PVH > > dom0, otherwise the internal Xen emulation for those ranges won't work. > > This requires an adjustment in dom0_setup_permissions(). > > > > Also the special casing of E820_UNUSABLE regions no longer needs to be done > > in arch_iommu_hwdom_init(), as those regions are already blocked in > > ->iomem_caps and thus would be removed from the rangeset as part of > > ->iomem_caps processing in arch_iommu_hwdom_init(). > > Only almost: ->iomem_caps has them removed only for addresses 1Mb and up. Right, but we would like to have parity between IOMMU and CPU page-tables, and hence should also allow IOMMU to map unusable regions below 1Mb if the CPU is also allowed to map them. I can tweak the commit message to note this is done on purpose. > > --- a/xen/arch/x86/dom0_build.c > > +++ b/xen/arch/x86/dom0_build.c > > @@ -552,7 +552,8 @@ int __init dom0_setup_permissions(struct domain *d) > > for ( i = 0; i < nr_ioapics; i++ ) > > { > > mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr); > > - if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) > > + if ( is_hvm_domain(d) || > > + !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) > > rc |= iomem_deny_access(d, mfn, mfn); > > } > > The original code used has_vioapic() and had a comment. At least one of# > the two would better be transplanted here, I think. I can indeed switch to has_vioapic() and keep the comment. > > @@ -593,6 +594,22 @@ int __init dom0_setup_permissions(struct domain *d) > > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > > } > > > > + /* For PVH dom0 prevent access to MCFG, it's emulated by Xen. */ > > + if ( is_hvm_domain(d) ) > > + { > > + for ( i = 0; i < pci_mmcfg_config_num; i++ ) > > + { > > + const unsigned long s = > > + PFN_DOWN(pci_mmcfg_config[i].address) + > > + PCI_BDF(pci_mmcfg_config[i].start_bus_number, 0, 0); > > + const unsigned long e = > > + PFN_DOWN(pci_mmcfg_config[i].address) + > > + PCI_BDF(pci_mmcfg_config[i].end_bus_number, ~0, ~0); > > + > > + rc |= iomem_deny_access(d, s, e); > > + } > > + } > > Similarly here, the original code used has_vpci() and also had a comment. > Is there actually a strong reason to replace the original MCFG code? Hm, I see, I could tweak vpci_subtract_mmcfg() to remove the regions from ->iomem_cap instead of open-coding here. > > --- a/xen/drivers/passthrough/x86/iommu.c > > +++ b/xen/drivers/passthrough/x86/iommu.c > > @@ -320,6 +320,26 @@ static int __hwdom_init cf_check map_subtract(unsigned long s, unsigned long e, > > return rangeset_remove_range(map, s, e); > > } > > > > +struct handle_iomemcap { > > + struct rangeset *r; > > + unsigned long last; > > +}; > > +static int __hwdom_init cf_check map_subtract_iomemcap(unsigned long s, > > + unsigned long e, > > + void *data) > > +{ > > + struct handle_iomemcap *h = data; > > + int rc = 0; > > + > > + if ( h->last != s ) > > + rc = rangeset_remove_range(h->r, h->last, s - 1); > > + > > + /* Be careful with overflows. */ > > + h->last = max(e + 1, e); > > What overflow could occur here? Addresses are limited to 52 bits; frame > numbers are limited to 40 bits. And ->iomem_caps starts out with a range > ending at the last address permitted by paddr_bits. I was misremembering ->iomem_caps being initialized as [0, ~0UL] for dom0, but that's not the case. I can indeed drop the max() here. > > @@ -475,22 +488,13 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d) > > if ( rc ) > > panic("IOMMU failed to remove Xen ranges: %d\n", rc); > > > > - /* Remove any overlap with the Interrupt Address Range. */ > > - rc = rangeset_remove_range(map, 0xfee00, 0xfeeff); > > + iomem.r = map; > > + rc = rangeset_report_ranges(d->iomem_caps, 0, ~0UL, map_subtract_iomemcap, > > + &iomem); > > + if ( !rc && iomem.last < ~0UL ) > > + rc = rangeset_remove_range(map, iomem.last, ~0UL); > > Similarly here I'm wondering about the upper bound of ~0UL. Is this just > to be on the safe side? And/or just because it's simpler than calculating > the actual upper bound? I could use the actual upper bound, as we already use it a bit below in: /* Remove any regions past the last address addressable by the domain. */ rc = rangeset_remove_range(map, PFN_DOWN(1UL << domain_max_paddr_bits(d)), ~0UL); However using ~0UL is also correct for the purposes here. > No ranges above the system physical address width > should ever be entered into the rangeset. Kind of as an implication > iomem.last also is guaranteed to be below ~0UL when making it here. I could add an assert: ASSERT(iomem.last < PFN_DOWN(1UL << domain_max_paddr_bits(d))); But then I would need to adjust dom0_setup_permissions() to also use domain_max_paddr_bits() instead of using paddr_bits for the ->iomem_caps initial max address. That should likely be a separate patch on it's own. Thanks, Roger.
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c index e8f5bf5447bc..22da9ba5a362 100644 --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -552,7 +552,8 @@ int __init dom0_setup_permissions(struct domain *d) for ( i = 0; i < nr_ioapics; i++ ) { mfn = paddr_to_pfn(mp_ioapics[i].mpc_apicaddr); - if ( !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) + if ( is_hvm_domain(d) || + !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) rc |= iomem_deny_access(d, mfn, mfn); } /* MSI range. */ @@ -593,6 +594,22 @@ int __init dom0_setup_permissions(struct domain *d) rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); } + /* For PVH dom0 prevent access to MCFG, it's emulated by Xen. */ + if ( is_hvm_domain(d) ) + { + for ( i = 0; i < pci_mmcfg_config_num; i++ ) + { + const unsigned long s = + PFN_DOWN(pci_mmcfg_config[i].address) + + PCI_BDF(pci_mmcfg_config[i].start_bus_number, 0, 0); + const unsigned long e = + PFN_DOWN(pci_mmcfg_config[i].address) + + PCI_BDF(pci_mmcfg_config[i].end_bus_number, ~0, ~0); + + rc |= iomem_deny_access(d, s, e); + } + } + return rc; } diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c index db726b38177b..174c3285a636 100644 --- a/xen/arch/x86/hvm/io.c +++ b/xen/arch/x86/hvm/io.c @@ -363,22 +363,6 @@ static const struct hvm_mmcfg *vpci_mmcfg_find(const struct domain *d, return NULL; } -int __hwdom_init vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r) -{ - const struct hvm_mmcfg *mmcfg; - - list_for_each_entry ( mmcfg, &d->arch.hvm.mmcfg_regions, next ) - { - int rc = rangeset_remove_range(r, PFN_DOWN(mmcfg->addr), - PFN_DOWN(mmcfg->addr + mmcfg->size - 1)); - - if ( rc ) - return rc; - } - - return 0; -} - static unsigned int vpci_mmcfg_decode_addr(const struct hvm_mmcfg *mmcfg, paddr_t addr, pci_sbdf_t *sbdf) { diff --git a/xen/arch/x86/include/asm/hvm/io.h b/xen/arch/x86/include/asm/hvm/io.h index f2b8431facb0..0881e9e9f72d 100644 --- a/xen/arch/x86/include/asm/hvm/io.h +++ b/xen/arch/x86/include/asm/hvm/io.h @@ -132,9 +132,6 @@ int register_vpci_mmcfg_handler(struct domain *d, paddr_t addr, /* Destroy tracked MMCFG areas. */ void destroy_vpci_mmcfg(struct domain *d); -/* Remove MMCFG regions from a given rangeset. */ -int vpci_subtract_mmcfg(const struct domain *d, struct rangeset *r); - #endif /* __ASM_X86_HVM_IO_H__ */ diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c index 8b1e0596b84a..3b386eb19d13 100644 --- a/xen/drivers/passthrough/x86/iommu.c +++ b/xen/drivers/passthrough/x86/iommu.c @@ -320,6 +320,26 @@ static int __hwdom_init cf_check map_subtract(unsigned long s, unsigned long e, return rangeset_remove_range(map, s, e); } +struct handle_iomemcap { + struct rangeset *r; + unsigned long last; +}; +static int __hwdom_init cf_check map_subtract_iomemcap(unsigned long s, + unsigned long e, + void *data) +{ + struct handle_iomemcap *h = data; + int rc = 0; + + if ( h->last != s ) + rc = rangeset_remove_range(h->r, h->last, s - 1); + + /* Be careful with overflows. */ + h->last = max(e + 1, e); + + return rc; +} + struct map_data { struct domain *d; unsigned int flush_flags; @@ -400,6 +420,7 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d) unsigned int i; struct rangeset *map; struct map_data map_data = { .d = d }; + struct handle_iomemcap iomem = {}; int rc; BUG_ON(!is_hardware_domain(d)); @@ -442,14 +463,6 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d) switch ( entry.type ) { - case E820_UNUSABLE: - /* Only relevant for inclusive mode, otherwise this is a no-op. */ - rc = rangeset_remove_range(map, PFN_DOWN(entry.addr), - PFN_DOWN(entry.addr + entry.size - 1)); - if ( rc ) - panic("IOMMU failed to remove unusable memory: %d\n", rc); - continue; - case E820_RESERVED: if ( !iommu_hwdom_inclusive && !iommu_hwdom_reserved ) continue; @@ -475,22 +488,13 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d) if ( rc ) panic("IOMMU failed to remove Xen ranges: %d\n", rc); - /* Remove any overlap with the Interrupt Address Range. */ - rc = rangeset_remove_range(map, 0xfee00, 0xfeeff); + iomem.r = map; + rc = rangeset_report_ranges(d->iomem_caps, 0, ~0UL, map_subtract_iomemcap, + &iomem); + if ( !rc && iomem.last < ~0UL ) + rc = rangeset_remove_range(map, iomem.last, ~0UL); if ( rc ) - panic("IOMMU failed to remove Interrupt Address Range: %d\n", rc); - - /* If emulating IO-APIC(s) make sure the base address is unmapped. */ - if ( has_vioapic(d) ) - { - for ( i = 0; i < d->arch.hvm.nr_vioapics; i++ ) - { - rc = rangeset_remove_singleton(map, - PFN_DOWN(domain_vioapic(d, i)->base_address)); - if ( rc ) - panic("IOMMU failed to remove IO-APIC: %d\n", rc); - } - } + panic("IOMMU failed to remove forbidden regions: %d\n", rc); if ( is_pv_domain(d) ) { @@ -506,17 +510,6 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d) panic("IOMMU failed to remove read-only regions: %d\n", rc); } - if ( has_vpci(d) ) - { - /* - * TODO: runtime added MMCFG regions are not checked to make sure they - * don't overlap with already mapped regions, thus preventing trapping. - */ - rc = vpci_subtract_mmcfg(d, map); - if ( rc ) - panic("IOMMU unable to remove MMCFG areas: %d\n", rc); - } - /* Remove any regions past the last address addressable by the domain. */ rc = rangeset_remove_range(map, PFN_DOWN(1UL << domain_max_paddr_bits(d)), ~0UL);
The current code in arch_iommu_hwdom_init() kind of open-codes the same MMIO permission ranges that are added to the hardware domain ->iomem_caps. Avoid this duplication and use ->iomem_caps in arch_iommu_hwdom_init() to filter which memory regions should be added to the dom0 IOMMU page-tables. This should result in no change in the memory regions that end up identity mapped in the domain IOMMU page tables. Note the IO-APIC and MCFG page(s) must be set as not accessible for a PVH dom0, otherwise the internal Xen emulation for those ranges won't work. This requires an adjustment in dom0_setup_permissions(). Also the special casing of E820_UNUSABLE regions no longer needs to be done in arch_iommu_hwdom_init(), as those regions are already blocked in ->iomem_caps and thus would be removed from the rangeset as part of ->iomem_caps processing in arch_iommu_hwdom_init(). Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Changes since v1: - New in this version. --- xen/arch/x86/dom0_build.c | 19 ++++++++- xen/arch/x86/hvm/io.c | 16 -------- xen/arch/x86/include/asm/hvm/io.h | 3 -- xen/drivers/passthrough/x86/iommu.c | 61 +++++++++++++---------------- 4 files changed, 45 insertions(+), 54 deletions(-)