Message ID | 20250219164840.94803-4-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/dom0: be less restrictive with the Interrupt Address Range | expand |
On 19.02.2025 17:48, Roger Pau Monne wrote: > Xen currently prevents dom0 from creating CPU or IOMMU page-table mappings > into the interrupt address range [0xfee00000, 0xfeefffff]. This range has > two different purposes. For accesses from the CPU is contains the default > position of local APIC page at 0xfee00000. For accesses from devices > it's the MSI address range, so the address field in the MSI entries > (usually) point to an address on that range to trigger an interrupt. > > There are reports of Lenovo Thinkpad devices placing what seems to be the > UCSI shared mailbox at address 0xfeec2000 in the interrupt address range. > Attempting to use that device with a Linux PV dom0 leads to an error when > Linux kernel maps 0xfeec2000: > > RIP: e030:xen_mc_flush+0x1e8/0x2b0 > xen_leave_lazy_mmu+0x15/0x60 > vmap_range_noflush+0x408/0x6f0 > __ioremap_caller+0x20d/0x350 > acpi_os_map_iomem+0x1a3/0x1c0 > acpi_ex_system_memory_space_handler+0x229/0x3f0 > acpi_ev_address_space_dispatch+0x17e/0x4c0 > acpi_ex_access_region+0x28a/0x510 > acpi_ex_field_datum_io+0x95/0x5c0 > acpi_ex_extract_from_field+0x36b/0x4e0 > acpi_ex_read_data_from_field+0xcb/0x430 > acpi_ex_resolve_node_to_value+0x2e0/0x530 > acpi_ex_resolve_to_value+0x1e7/0x550 > acpi_ds_evaluate_name_path+0x107/0x170 > acpi_ds_exec_end_op+0x392/0x860 > acpi_ps_parse_loop+0x268/0xa30 > acpi_ps_parse_aml+0x221/0x5e0 > acpi_ps_execute_method+0x171/0x3e0 > acpi_ns_evaluate+0x174/0x5d0 > acpi_evaluate_object+0x167/0x440 > acpi_evaluate_dsm+0xb6/0x130 > ucsi_acpi_dsm+0x53/0x80 > ucsi_acpi_read+0x2e/0x60 > ucsi_register+0x24/0xa0 > ucsi_acpi_probe+0x162/0x1e3 > platform_probe+0x48/0x90 > really_probe+0xde/0x340 > __driver_probe_device+0x78/0x110 > driver_probe_device+0x1f/0x90 > __driver_attach+0xd2/0x1c0 > bus_for_each_dev+0x77/0xc0 > bus_add_driver+0x112/0x1f0 > driver_register+0x72/0xd0 > do_one_initcall+0x48/0x300 > do_init_module+0x60/0x220 > __do_sys_init_module+0x17f/0x1b0 > do_syscall_64+0x82/0x170 > > Remove the restrictions to create mappings the interrupt address range for Nit: Missing "in"? > dom0. Note that the restriction to map the local APIC page is enforced > separately, and that continues to be present. Additionally make sure the > emulated local APIC page is also not mapped, in case dom0 is using it. But that's in GFN space, not in MFN one. Why would that matter for iomem_caps? Jan
On Thu, Feb 20, 2025 at 09:33:46AM +0100, Jan Beulich wrote: > On 19.02.2025 17:48, Roger Pau Monne wrote: > > Xen currently prevents dom0 from creating CPU or IOMMU page-table mappings > > into the interrupt address range [0xfee00000, 0xfeefffff]. This range has > > two different purposes. For accesses from the CPU is contains the default > > position of local APIC page at 0xfee00000. For accesses from devices > > it's the MSI address range, so the address field in the MSI entries > > (usually) point to an address on that range to trigger an interrupt. > > > > There are reports of Lenovo Thinkpad devices placing what seems to be the > > UCSI shared mailbox at address 0xfeec2000 in the interrupt address range. > > Attempting to use that device with a Linux PV dom0 leads to an error when > > Linux kernel maps 0xfeec2000: > > > > RIP: e030:xen_mc_flush+0x1e8/0x2b0 > > xen_leave_lazy_mmu+0x15/0x60 > > vmap_range_noflush+0x408/0x6f0 > > __ioremap_caller+0x20d/0x350 > > acpi_os_map_iomem+0x1a3/0x1c0 > > acpi_ex_system_memory_space_handler+0x229/0x3f0 > > acpi_ev_address_space_dispatch+0x17e/0x4c0 > > acpi_ex_access_region+0x28a/0x510 > > acpi_ex_field_datum_io+0x95/0x5c0 > > acpi_ex_extract_from_field+0x36b/0x4e0 > > acpi_ex_read_data_from_field+0xcb/0x430 > > acpi_ex_resolve_node_to_value+0x2e0/0x530 > > acpi_ex_resolve_to_value+0x1e7/0x550 > > acpi_ds_evaluate_name_path+0x107/0x170 > > acpi_ds_exec_end_op+0x392/0x860 > > acpi_ps_parse_loop+0x268/0xa30 > > acpi_ps_parse_aml+0x221/0x5e0 > > acpi_ps_execute_method+0x171/0x3e0 > > acpi_ns_evaluate+0x174/0x5d0 > > acpi_evaluate_object+0x167/0x440 > > acpi_evaluate_dsm+0xb6/0x130 > > ucsi_acpi_dsm+0x53/0x80 > > ucsi_acpi_read+0x2e/0x60 > > ucsi_register+0x24/0xa0 > > ucsi_acpi_probe+0x162/0x1e3 > > platform_probe+0x48/0x90 > > really_probe+0xde/0x340 > > __driver_probe_device+0x78/0x110 > > driver_probe_device+0x1f/0x90 > > __driver_attach+0xd2/0x1c0 > > bus_for_each_dev+0x77/0xc0 > > bus_add_driver+0x112/0x1f0 > > driver_register+0x72/0xd0 > > do_one_initcall+0x48/0x300 > > do_init_module+0x60/0x220 > > __do_sys_init_module+0x17f/0x1b0 > > do_syscall_64+0x82/0x170 > > > > Remove the restrictions to create mappings the interrupt address range for > > Nit: Missing "in"? Indeed, thanks for spotting. > > dom0. Note that the restriction to map the local APIC page is enforced > > separately, and that continues to be present. Additionally make sure the > > emulated local APIC page is also not mapped, in case dom0 is using it. > > But that's in GFN space, not in MFN one. Why would that matter for iomem_caps? It's required to avoid arch_iommu_hwdom_init() creating an identity mapping for APIC_DEFAULT_PHYS_BASE, which would prevent the local APIC emulation from being used. Note that mp_lapic_addr can be zeor if the host local APICs are started in x2APIC mode, or it could (in theory) contain an address different than APIC_DEFAULT_PHYS_BASE. Thanks, Roger.
On 20.02.2025 09:55, Roger Pau Monné wrote: > On Thu, Feb 20, 2025 at 09:33:46AM +0100, Jan Beulich wrote: >> On 19.02.2025 17:48, Roger Pau Monne wrote: >>> Note that the restriction to map the local APIC page is enforced >>> separately, and that continues to be present. Additionally make sure the >>> emulated local APIC page is also not mapped, in case dom0 is using it. >> >> But that's in GFN space, not in MFN one. Why would that matter for iomem_caps? > > It's required to avoid arch_iommu_hwdom_init() creating an identity > mapping for APIC_DEFAULT_PHYS_BASE, which would prevent the local APIC > emulation from being used. Hmm, yes, on one hand such a mapping would be created by default, as we default to "dom0-iommu=map-reserved". Otoh that mapping would be replaced before Dom0 is actually started, via the domain_creation_finished() hook. On (modern) VMX that is. So yes, on old VMX and on SVM the slot would need to remain unpopulated. Otoh, when the physical LAPICs are elsewhere and when the domain is in x2APIC mode, there would be no reason to disallow Dom0 access to that page. That would apparently mean fiddling with iomem_caps once all vCPU-s have entered x2APIC mode. With LAPICs not normally being elsewhere, question is whether this corner case actually needs dealing with. Yet even if not, commentary may want extending, just to make clear the case was considered? > Note that mp_lapic_addr can be zeor if the host local APICs are > started in x2APIC mode, or it could (in theory) contain an address > different than APIC_DEFAULT_PHYS_BASE. Of course; I didn't mean to suggest what you do is simply redundant. Jan
On Thu, Feb 20, 2025 at 02:30:38PM +0100, Jan Beulich wrote: > On 20.02.2025 09:55, Roger Pau Monné wrote: > > On Thu, Feb 20, 2025 at 09:33:46AM +0100, Jan Beulich wrote: > >> On 19.02.2025 17:48, Roger Pau Monne wrote: > >>> Note that the restriction to map the local APIC page is enforced > >>> separately, and that continues to be present. Additionally make sure the > >>> emulated local APIC page is also not mapped, in case dom0 is using it. > >> > >> But that's in GFN space, not in MFN one. Why would that matter for iomem_caps? > > > > It's required to avoid arch_iommu_hwdom_init() creating an identity > > mapping for APIC_DEFAULT_PHYS_BASE, which would prevent the local APIC > > emulation from being used. > > Hmm, yes, on one hand such a mapping would be created by default, as we > default to "dom0-iommu=map-reserved". Otoh that mapping would be replaced > before Dom0 is actually started, via the domain_creation_finished() hook. > On (modern) VMX that is. So yes, on old VMX and on SVM the slot would need > to remain unpopulated. Otoh, when the physical LAPICs are elsewhere and > when the domain is in x2APIC mode, there would be no reason to disallow > Dom0 access to that page. Right, but that's now how dom0 is started ATM, as the local APIC is unconditionally started in xAPIC mode and at APIC_DEFAULT_PHYS_BASE. I could use vlapic_base_address() against vCPU#0 vlapic, but even in guest_wrmsr_apic_base() we don't allow moving the local APIC MMIO region, and hence I assumed it was fine to just use APIC_DEFAULT_PHYS_BASE here. Note in pvh_setup_acpi_madt() Xen also hardcodes the local APIC address to APIC_DEFAULT_PHYS_BASE. Would you be fine if I expand the comment so it's: /* If using an emulated local APIC make sure its MMIO is unpopulated. */ if ( has_vlapic(d) ) { /* Xen doesn't allow changing the local APIC MMIO window position. */ mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE); rc |= iomem_deny_access(d, mfn, mfn); } > That would apparently mean fiddling with > iomem_caps once all vCPU-s have entered x2APIC mode. Urg, that seems ugly. It would also need undoing if the APICs are reverted to xAPIC mode? > With LAPICs not > normally being elsewhere, question is whether this corner case actually > needs dealing with. Yet even if not, commentary may want extending, just > to make clear the case was considered? As said above, for both HVM and PVH Xen doesn't allow moving the APIC MMIO window to anything different than APIC_DEFAULT_PHYS_BASE. Thanks, Roger.
On 20.02.2025 16:40, Roger Pau Monné wrote: > On Thu, Feb 20, 2025 at 02:30:38PM +0100, Jan Beulich wrote: >> On 20.02.2025 09:55, Roger Pau Monné wrote: >>> On Thu, Feb 20, 2025 at 09:33:46AM +0100, Jan Beulich wrote: >>>> On 19.02.2025 17:48, Roger Pau Monne wrote: >>>>> Note that the restriction to map the local APIC page is enforced >>>>> separately, and that continues to be present. Additionally make sure the >>>>> emulated local APIC page is also not mapped, in case dom0 is using it. >>>> >>>> But that's in GFN space, not in MFN one. Why would that matter for iomem_caps? >>> >>> It's required to avoid arch_iommu_hwdom_init() creating an identity >>> mapping for APIC_DEFAULT_PHYS_BASE, which would prevent the local APIC >>> emulation from being used. >> >> Hmm, yes, on one hand such a mapping would be created by default, as we >> default to "dom0-iommu=map-reserved". Otoh that mapping would be replaced >> before Dom0 is actually started, via the domain_creation_finished() hook. >> On (modern) VMX that is. So yes, on old VMX and on SVM the slot would need >> to remain unpopulated. Otoh, when the physical LAPICs are elsewhere and >> when the domain is in x2APIC mode, there would be no reason to disallow >> Dom0 access to that page. > > Right, but that's now how dom0 is started ATM, as the local APIC is > unconditionally started in xAPIC mode and at APIC_DEFAULT_PHYS_BASE. > > I could use vlapic_base_address() against vCPU#0 vlapic, but even in > guest_wrmsr_apic_base() we don't allow moving the local APIC MMIO > region, and hence I assumed it was fine to just use > APIC_DEFAULT_PHYS_BASE here. Note in pvh_setup_acpi_madt() Xen also > hardcodes the local APIC address to APIC_DEFAULT_PHYS_BASE. > > Would you be fine if I expand the comment so it's: > > /* If using an emulated local APIC make sure its MMIO is unpopulated. */ > if ( has_vlapic(d) ) > { > /* Xen doesn't allow changing the local APIC MMIO window position. */ > mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE); > rc |= iomem_deny_access(d, mfn, mfn); > } That will do, I think. Then: Acked-by: Jan Beulich <jbeulich@suse.com> >> That would apparently mean fiddling with >> iomem_caps once all vCPU-s have entered x2APIC mode. > > Urg, that seems ugly. It would also need undoing if the APICs are > reverted to xAPIC mode? Right. >> With LAPICs not >> normally being elsewhere, question is whether this corner case actually >> needs dealing with. Yet even if not, commentary may want extending, just >> to make clear the case was considered? > > As said above, for both HVM and PVH Xen doesn't allow moving the APIC > MMIO window to anything different than APIC_DEFAULT_PHYS_BASE. I was talking about the real one Xen uses. Jan
On Thu, Feb 20, 2025 at 05:05:44PM +0100, Jan Beulich wrote: > On 20.02.2025 16:40, Roger Pau Monné wrote: > > On Thu, Feb 20, 2025 at 02:30:38PM +0100, Jan Beulich wrote: > >> On 20.02.2025 09:55, Roger Pau Monné wrote: > >>> On Thu, Feb 20, 2025 at 09:33:46AM +0100, Jan Beulich wrote: > >>>> On 19.02.2025 17:48, Roger Pau Monne wrote: > >>>>> Note that the restriction to map the local APIC page is enforced > >>>>> separately, and that continues to be present. Additionally make sure the > >>>>> emulated local APIC page is also not mapped, in case dom0 is using it. > >>>> > >>>> But that's in GFN space, not in MFN one. Why would that matter for iomem_caps? > >>> > >>> It's required to avoid arch_iommu_hwdom_init() creating an identity > >>> mapping for APIC_DEFAULT_PHYS_BASE, which would prevent the local APIC > >>> emulation from being used. > >> > >> Hmm, yes, on one hand such a mapping would be created by default, as we > >> default to "dom0-iommu=map-reserved". Otoh that mapping would be replaced > >> before Dom0 is actually started, via the domain_creation_finished() hook. > >> On (modern) VMX that is. So yes, on old VMX and on SVM the slot would need > >> to remain unpopulated. Otoh, when the physical LAPICs are elsewhere and > >> when the domain is in x2APIC mode, there would be no reason to disallow > >> Dom0 access to that page. > > > > Right, but that's now how dom0 is started ATM, as the local APIC is > > unconditionally started in xAPIC mode and at APIC_DEFAULT_PHYS_BASE. > > > > I could use vlapic_base_address() against vCPU#0 vlapic, but even in > > guest_wrmsr_apic_base() we don't allow moving the local APIC MMIO > > region, and hence I assumed it was fine to just use > > APIC_DEFAULT_PHYS_BASE here. Note in pvh_setup_acpi_madt() Xen also > > hardcodes the local APIC address to APIC_DEFAULT_PHYS_BASE. > > > > Would you be fine if I expand the comment so it's: > > > > /* If using an emulated local APIC make sure its MMIO is unpopulated. */ > > if ( has_vlapic(d) ) > > { > > /* Xen doesn't allow changing the local APIC MMIO window position. */ > > mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE); > > rc |= iomem_deny_access(d, mfn, mfn); > > } > > That will do, I think. Then: > Acked-by: Jan Beulich <jbeulich@suse.com> Thanks. > >> That would apparently mean fiddling with > >> iomem_caps once all vCPU-s have entered x2APIC mode. > > > > Urg, that seems ugly. It would also need undoing if the APICs are > > reverted to xAPIC mode? > > Right. > > >> With LAPICs not > >> normally being elsewhere, question is whether this corner case actually > >> needs dealing with. Yet even if not, commentary may want extending, just > >> to make clear the case was considered? > > > > As said above, for both HVM and PVH Xen doesn't allow moving the APIC > > MMIO window to anything different than APIC_DEFAULT_PHYS_BASE. > > I was talking about the real one Xen uses. Oh, OK, I was confused by the context I think, sorry then. Roger.
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c index a735e3650c28..3cda0c2c8765 100644 --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -554,6 +554,12 @@ int __init dom0_setup_permissions(struct domain *d) mfn = paddr_to_pfn(mp_lapic_addr); rc |= iomem_deny_access(d, mfn, mfn); } + /* If using an emulated local APIC make sure its MMIO is unpopulated. */ + if ( has_vlapic(d) ) + { + mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE); + rc |= iomem_deny_access(d, mfn, mfn); + } /* I/O APICs. */ for ( i = 0; i < nr_ioapics; i++ ) { @@ -563,10 +569,6 @@ int __init dom0_setup_permissions(struct domain *d) !rangeset_contains_singleton(mmio_ro_ranges, mfn) ) rc |= iomem_deny_access(d, mfn, mfn); } - /* MSI range. */ - rc |= iomem_deny_access(d, paddr_to_pfn(MSI_ADDR_BASE_LO), - paddr_to_pfn(MSI_ADDR_BASE_LO + - MSI_ADDR_DEST_ID_MASK)); /* HyperTransport range. */ if ( boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) ) {