diff mbox series

[v3,3/3] x86/dom0: be less restrictive with the Interrupt Address Range

Message ID 20250219164840.94803-4-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series x86/dom0: be less restrictive with the Interrupt Address Range | expand

Commit Message

Roger Pau Monné Feb. 19, 2025, 4:48 p.m. UTC
Xen currently prevents dom0 from creating CPU or IOMMU page-table mappings
into the interrupt address range [0xfee00000, 0xfeefffff].  This range has
two different purposes.  For accesses from the CPU is contains the default
position of local APIC page at 0xfee00000.  For accesses from devices
it's the MSI address range, so the address field in the MSI entries
(usually) point to an address on that range to trigger an interrupt.

There are reports of Lenovo Thinkpad devices placing what seems to be the
UCSI shared mailbox at address 0xfeec2000 in the interrupt address range.
Attempting to use that device with a Linux PV dom0 leads to an error when
Linux kernel maps 0xfeec2000:

RIP: e030:xen_mc_flush+0x1e8/0x2b0
 xen_leave_lazy_mmu+0x15/0x60
 vmap_range_noflush+0x408/0x6f0
 __ioremap_caller+0x20d/0x350
 acpi_os_map_iomem+0x1a3/0x1c0
 acpi_ex_system_memory_space_handler+0x229/0x3f0
 acpi_ev_address_space_dispatch+0x17e/0x4c0
 acpi_ex_access_region+0x28a/0x510
 acpi_ex_field_datum_io+0x95/0x5c0
 acpi_ex_extract_from_field+0x36b/0x4e0
 acpi_ex_read_data_from_field+0xcb/0x430
 acpi_ex_resolve_node_to_value+0x2e0/0x530
 acpi_ex_resolve_to_value+0x1e7/0x550
 acpi_ds_evaluate_name_path+0x107/0x170
 acpi_ds_exec_end_op+0x392/0x860
 acpi_ps_parse_loop+0x268/0xa30
 acpi_ps_parse_aml+0x221/0x5e0
 acpi_ps_execute_method+0x171/0x3e0
 acpi_ns_evaluate+0x174/0x5d0
 acpi_evaluate_object+0x167/0x440
 acpi_evaluate_dsm+0xb6/0x130
 ucsi_acpi_dsm+0x53/0x80
 ucsi_acpi_read+0x2e/0x60
 ucsi_register+0x24/0xa0
 ucsi_acpi_probe+0x162/0x1e3
 platform_probe+0x48/0x90
 really_probe+0xde/0x340
 __driver_probe_device+0x78/0x110
 driver_probe_device+0x1f/0x90
 __driver_attach+0xd2/0x1c0
 bus_for_each_dev+0x77/0xc0
 bus_add_driver+0x112/0x1f0
 driver_register+0x72/0xd0
 do_one_initcall+0x48/0x300
 do_init_module+0x60/0x220
 __do_sys_init_module+0x17f/0x1b0
 do_syscall_64+0x82/0x170

Remove the restrictions to create mappings the interrupt address range for
dom0.  Note that the restriction to map the local APIC page is enforced
separately, and that continues to be present.  Additionally make sure the
emulated local APIC page is also not mapped, in case dom0 is using it.

Note that even if the interrupt address range entries are populated in the
IOMMU page-tables no device access will reach those pages.  Device accesses
to the Interrupt Address Range will always be converted into Interrupt
Messages and are not subject to DMA remapping.

There's also the following restriction noted in Intel VT-d:

> Software must not program paging-structure entries to remap any address to
> the interrupt address range. Untranslated requests and translation requests
> that result in an address in the interrupt range will be blocked with
> condition code LGN.4 or SGN.8. Translated requests with an address in the
> interrupt address range are treated as Unsupported Request (UR).

Similarly for AMD-Vi:

> Accesses to the interrupt address range (Table 3) are defined to go through
> the interrupt remapping portion of the IOMMU and not through address
> translation processing. Therefore, when a transaction is being processed as
> an interrupt remapping operation, the transaction attribute of
> pretranslated or untranslated is ignored.
>
> Software Note: The IOMMU should
> not be configured such that an address translation results in a special
> address such as the interrupt address range.

However those restrictions don't apply to the identity mappings possibly
created for dom0, since the interrupt address range is never subject to DMA
remapping, and hence there's no output address after translation that
belongs to the interrupt address range.

Reported-by: Jürgen Groß <jgross@suse.com>
Link: https://lore.kernel.org/xen-devel/baade0a7-e204-4743-bda1-282df74e5f89@suse.com/
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Make sure vlapic page is not mapped.

Changes since v1:
 - No longer needs to modify arch_iommu_hwdom_init().
---
 xen/arch/x86/dom0_build.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Jan Beulich Feb. 20, 2025, 8:33 a.m. UTC | #1
On 19.02.2025 17:48, Roger Pau Monne wrote:
> Xen currently prevents dom0 from creating CPU or IOMMU page-table mappings
> into the interrupt address range [0xfee00000, 0xfeefffff].  This range has
> two different purposes.  For accesses from the CPU is contains the default
> position of local APIC page at 0xfee00000.  For accesses from devices
> it's the MSI address range, so the address field in the MSI entries
> (usually) point to an address on that range to trigger an interrupt.
> 
> There are reports of Lenovo Thinkpad devices placing what seems to be the
> UCSI shared mailbox at address 0xfeec2000 in the interrupt address range.
> Attempting to use that device with a Linux PV dom0 leads to an error when
> Linux kernel maps 0xfeec2000:
> 
> RIP: e030:xen_mc_flush+0x1e8/0x2b0
>  xen_leave_lazy_mmu+0x15/0x60
>  vmap_range_noflush+0x408/0x6f0
>  __ioremap_caller+0x20d/0x350
>  acpi_os_map_iomem+0x1a3/0x1c0
>  acpi_ex_system_memory_space_handler+0x229/0x3f0
>  acpi_ev_address_space_dispatch+0x17e/0x4c0
>  acpi_ex_access_region+0x28a/0x510
>  acpi_ex_field_datum_io+0x95/0x5c0
>  acpi_ex_extract_from_field+0x36b/0x4e0
>  acpi_ex_read_data_from_field+0xcb/0x430
>  acpi_ex_resolve_node_to_value+0x2e0/0x530
>  acpi_ex_resolve_to_value+0x1e7/0x550
>  acpi_ds_evaluate_name_path+0x107/0x170
>  acpi_ds_exec_end_op+0x392/0x860
>  acpi_ps_parse_loop+0x268/0xa30
>  acpi_ps_parse_aml+0x221/0x5e0
>  acpi_ps_execute_method+0x171/0x3e0
>  acpi_ns_evaluate+0x174/0x5d0
>  acpi_evaluate_object+0x167/0x440
>  acpi_evaluate_dsm+0xb6/0x130
>  ucsi_acpi_dsm+0x53/0x80
>  ucsi_acpi_read+0x2e/0x60
>  ucsi_register+0x24/0xa0
>  ucsi_acpi_probe+0x162/0x1e3
>  platform_probe+0x48/0x90
>  really_probe+0xde/0x340
>  __driver_probe_device+0x78/0x110
>  driver_probe_device+0x1f/0x90
>  __driver_attach+0xd2/0x1c0
>  bus_for_each_dev+0x77/0xc0
>  bus_add_driver+0x112/0x1f0
>  driver_register+0x72/0xd0
>  do_one_initcall+0x48/0x300
>  do_init_module+0x60/0x220
>  __do_sys_init_module+0x17f/0x1b0
>  do_syscall_64+0x82/0x170
> 
> Remove the restrictions to create mappings the interrupt address range for

Nit: Missing "in"?

> dom0.  Note that the restriction to map the local APIC page is enforced
> separately, and that continues to be present.  Additionally make sure the
> emulated local APIC page is also not mapped, in case dom0 is using it.

But that's in GFN space, not in MFN one. Why would that matter for iomem_caps?

Jan
Roger Pau Monné Feb. 20, 2025, 8:55 a.m. UTC | #2
On Thu, Feb 20, 2025 at 09:33:46AM +0100, Jan Beulich wrote:
> On 19.02.2025 17:48, Roger Pau Monne wrote:
> > Xen currently prevents dom0 from creating CPU or IOMMU page-table mappings
> > into the interrupt address range [0xfee00000, 0xfeefffff].  This range has
> > two different purposes.  For accesses from the CPU is contains the default
> > position of local APIC page at 0xfee00000.  For accesses from devices
> > it's the MSI address range, so the address field in the MSI entries
> > (usually) point to an address on that range to trigger an interrupt.
> > 
> > There are reports of Lenovo Thinkpad devices placing what seems to be the
> > UCSI shared mailbox at address 0xfeec2000 in the interrupt address range.
> > Attempting to use that device with a Linux PV dom0 leads to an error when
> > Linux kernel maps 0xfeec2000:
> > 
> > RIP: e030:xen_mc_flush+0x1e8/0x2b0
> >  xen_leave_lazy_mmu+0x15/0x60
> >  vmap_range_noflush+0x408/0x6f0
> >  __ioremap_caller+0x20d/0x350
> >  acpi_os_map_iomem+0x1a3/0x1c0
> >  acpi_ex_system_memory_space_handler+0x229/0x3f0
> >  acpi_ev_address_space_dispatch+0x17e/0x4c0
> >  acpi_ex_access_region+0x28a/0x510
> >  acpi_ex_field_datum_io+0x95/0x5c0
> >  acpi_ex_extract_from_field+0x36b/0x4e0
> >  acpi_ex_read_data_from_field+0xcb/0x430
> >  acpi_ex_resolve_node_to_value+0x2e0/0x530
> >  acpi_ex_resolve_to_value+0x1e7/0x550
> >  acpi_ds_evaluate_name_path+0x107/0x170
> >  acpi_ds_exec_end_op+0x392/0x860
> >  acpi_ps_parse_loop+0x268/0xa30
> >  acpi_ps_parse_aml+0x221/0x5e0
> >  acpi_ps_execute_method+0x171/0x3e0
> >  acpi_ns_evaluate+0x174/0x5d0
> >  acpi_evaluate_object+0x167/0x440
> >  acpi_evaluate_dsm+0xb6/0x130
> >  ucsi_acpi_dsm+0x53/0x80
> >  ucsi_acpi_read+0x2e/0x60
> >  ucsi_register+0x24/0xa0
> >  ucsi_acpi_probe+0x162/0x1e3
> >  platform_probe+0x48/0x90
> >  really_probe+0xde/0x340
> >  __driver_probe_device+0x78/0x110
> >  driver_probe_device+0x1f/0x90
> >  __driver_attach+0xd2/0x1c0
> >  bus_for_each_dev+0x77/0xc0
> >  bus_add_driver+0x112/0x1f0
> >  driver_register+0x72/0xd0
> >  do_one_initcall+0x48/0x300
> >  do_init_module+0x60/0x220
> >  __do_sys_init_module+0x17f/0x1b0
> >  do_syscall_64+0x82/0x170
> > 
> > Remove the restrictions to create mappings the interrupt address range for
> 
> Nit: Missing "in"?

Indeed, thanks for spotting.

> > dom0.  Note that the restriction to map the local APIC page is enforced
> > separately, and that continues to be present.  Additionally make sure the
> > emulated local APIC page is also not mapped, in case dom0 is using it.
> 
> But that's in GFN space, not in MFN one. Why would that matter for iomem_caps?

It's required to avoid arch_iommu_hwdom_init() creating an identity
mapping for APIC_DEFAULT_PHYS_BASE, which would prevent the local APIC
emulation from being used.

Note that mp_lapic_addr can be zeor if the host local APICs are
started in x2APIC mode, or it could (in theory) contain an address
different than APIC_DEFAULT_PHYS_BASE.

Thanks, Roger.
Jan Beulich Feb. 20, 2025, 1:30 p.m. UTC | #3
On 20.02.2025 09:55, Roger Pau Monné wrote:
> On Thu, Feb 20, 2025 at 09:33:46AM +0100, Jan Beulich wrote:
>> On 19.02.2025 17:48, Roger Pau Monne wrote:
>>> Note that the restriction to map the local APIC page is enforced
>>> separately, and that continues to be present.  Additionally make sure the
>>> emulated local APIC page is also not mapped, in case dom0 is using it.
>>
>> But that's in GFN space, not in MFN one. Why would that matter for iomem_caps?
> 
> It's required to avoid arch_iommu_hwdom_init() creating an identity
> mapping for APIC_DEFAULT_PHYS_BASE, which would prevent the local APIC
> emulation from being used.

Hmm, yes, on one hand such a mapping would be created by default, as we
default to "dom0-iommu=map-reserved". Otoh that mapping would be replaced
before Dom0 is actually started, via the domain_creation_finished() hook.
On (modern) VMX that is. So yes, on old VMX and on SVM the slot would need
to remain unpopulated. Otoh, when the physical LAPICs are elsewhere and
when the domain is in x2APIC mode, there would be no reason to disallow
Dom0 access to that page. That would apparently mean fiddling with
iomem_caps once all vCPU-s have entered x2APIC mode. With LAPICs not
normally being elsewhere, question is whether this corner case actually
needs dealing with. Yet even if not, commentary may want extending, just
to make clear the case was considered?

> Note that mp_lapic_addr can be zeor if the host local APICs are
> started in x2APIC mode, or it could (in theory) contain an address
> different than APIC_DEFAULT_PHYS_BASE.

Of course; I didn't mean to suggest what you do is simply redundant.

Jan
Roger Pau Monné Feb. 20, 2025, 3:40 p.m. UTC | #4
On Thu, Feb 20, 2025 at 02:30:38PM +0100, Jan Beulich wrote:
> On 20.02.2025 09:55, Roger Pau Monné wrote:
> > On Thu, Feb 20, 2025 at 09:33:46AM +0100, Jan Beulich wrote:
> >> On 19.02.2025 17:48, Roger Pau Monne wrote:
> >>> Note that the restriction to map the local APIC page is enforced
> >>> separately, and that continues to be present.  Additionally make sure the
> >>> emulated local APIC page is also not mapped, in case dom0 is using it.
> >>
> >> But that's in GFN space, not in MFN one. Why would that matter for iomem_caps?
> > 
> > It's required to avoid arch_iommu_hwdom_init() creating an identity
> > mapping for APIC_DEFAULT_PHYS_BASE, which would prevent the local APIC
> > emulation from being used.
> 
> Hmm, yes, on one hand such a mapping would be created by default, as we
> default to "dom0-iommu=map-reserved". Otoh that mapping would be replaced
> before Dom0 is actually started, via the domain_creation_finished() hook.
> On (modern) VMX that is. So yes, on old VMX and on SVM the slot would need
> to remain unpopulated. Otoh, when the physical LAPICs are elsewhere and
> when the domain is in x2APIC mode, there would be no reason to disallow
> Dom0 access to that page.

Right, but that's now how dom0 is started ATM, as the local APIC is
unconditionally started in xAPIC mode and at APIC_DEFAULT_PHYS_BASE.

I could use vlapic_base_address() against vCPU#0 vlapic, but even in
guest_wrmsr_apic_base() we don't allow moving the local APIC MMIO
region, and hence I assumed it was fine to just use
APIC_DEFAULT_PHYS_BASE here.  Note in pvh_setup_acpi_madt() Xen also
hardcodes the local APIC address to APIC_DEFAULT_PHYS_BASE.

Would you be fine if I expand the comment so it's:

    /* If using an emulated local APIC make sure its MMIO is unpopulated. */
    if ( has_vlapic(d) )
    {
        /* Xen doesn't allow changing the local APIC MMIO window position. */
        mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE);
        rc |= iomem_deny_access(d, mfn, mfn);
    }

> That would apparently mean fiddling with
> iomem_caps once all vCPU-s have entered x2APIC mode.

Urg, that seems ugly.  It would also need undoing if the APICs are
reverted to xAPIC mode?

> With LAPICs not
> normally being elsewhere, question is whether this corner case actually
> needs dealing with. Yet even if not, commentary may want extending, just
> to make clear the case was considered?

As said above, for both HVM and PVH Xen doesn't allow moving the APIC
MMIO window to anything different than APIC_DEFAULT_PHYS_BASE.

Thanks, Roger.
Jan Beulich Feb. 20, 2025, 4:05 p.m. UTC | #5
On 20.02.2025 16:40, Roger Pau Monné wrote:
> On Thu, Feb 20, 2025 at 02:30:38PM +0100, Jan Beulich wrote:
>> On 20.02.2025 09:55, Roger Pau Monné wrote:
>>> On Thu, Feb 20, 2025 at 09:33:46AM +0100, Jan Beulich wrote:
>>>> On 19.02.2025 17:48, Roger Pau Monne wrote:
>>>>> Note that the restriction to map the local APIC page is enforced
>>>>> separately, and that continues to be present.  Additionally make sure the
>>>>> emulated local APIC page is also not mapped, in case dom0 is using it.
>>>>
>>>> But that's in GFN space, not in MFN one. Why would that matter for iomem_caps?
>>>
>>> It's required to avoid arch_iommu_hwdom_init() creating an identity
>>> mapping for APIC_DEFAULT_PHYS_BASE, which would prevent the local APIC
>>> emulation from being used.
>>
>> Hmm, yes, on one hand such a mapping would be created by default, as we
>> default to "dom0-iommu=map-reserved". Otoh that mapping would be replaced
>> before Dom0 is actually started, via the domain_creation_finished() hook.
>> On (modern) VMX that is. So yes, on old VMX and on SVM the slot would need
>> to remain unpopulated. Otoh, when the physical LAPICs are elsewhere and
>> when the domain is in x2APIC mode, there would be no reason to disallow
>> Dom0 access to that page.
> 
> Right, but that's now how dom0 is started ATM, as the local APIC is
> unconditionally started in xAPIC mode and at APIC_DEFAULT_PHYS_BASE.
> 
> I could use vlapic_base_address() against vCPU#0 vlapic, but even in
> guest_wrmsr_apic_base() we don't allow moving the local APIC MMIO
> region, and hence I assumed it was fine to just use
> APIC_DEFAULT_PHYS_BASE here.  Note in pvh_setup_acpi_madt() Xen also
> hardcodes the local APIC address to APIC_DEFAULT_PHYS_BASE.
> 
> Would you be fine if I expand the comment so it's:
> 
>     /* If using an emulated local APIC make sure its MMIO is unpopulated. */
>     if ( has_vlapic(d) )
>     {
>         /* Xen doesn't allow changing the local APIC MMIO window position. */
>         mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE);
>         rc |= iomem_deny_access(d, mfn, mfn);
>     }

That will do, I think. Then:
Acked-by: Jan Beulich <jbeulich@suse.com>

>> That would apparently mean fiddling with
>> iomem_caps once all vCPU-s have entered x2APIC mode.
> 
> Urg, that seems ugly.  It would also need undoing if the APICs are
> reverted to xAPIC mode?

Right.

>> With LAPICs not
>> normally being elsewhere, question is whether this corner case actually
>> needs dealing with. Yet even if not, commentary may want extending, just
>> to make clear the case was considered?
> 
> As said above, for both HVM and PVH Xen doesn't allow moving the APIC
> MMIO window to anything different than APIC_DEFAULT_PHYS_BASE.

I was talking about the real one Xen uses.

Jan
Roger Pau Monné Feb. 20, 2025, 4:16 p.m. UTC | #6
On Thu, Feb 20, 2025 at 05:05:44PM +0100, Jan Beulich wrote:
> On 20.02.2025 16:40, Roger Pau Monné wrote:
> > On Thu, Feb 20, 2025 at 02:30:38PM +0100, Jan Beulich wrote:
> >> On 20.02.2025 09:55, Roger Pau Monné wrote:
> >>> On Thu, Feb 20, 2025 at 09:33:46AM +0100, Jan Beulich wrote:
> >>>> On 19.02.2025 17:48, Roger Pau Monne wrote:
> >>>>> Note that the restriction to map the local APIC page is enforced
> >>>>> separately, and that continues to be present.  Additionally make sure the
> >>>>> emulated local APIC page is also not mapped, in case dom0 is using it.
> >>>>
> >>>> But that's in GFN space, not in MFN one. Why would that matter for iomem_caps?
> >>>
> >>> It's required to avoid arch_iommu_hwdom_init() creating an identity
> >>> mapping for APIC_DEFAULT_PHYS_BASE, which would prevent the local APIC
> >>> emulation from being used.
> >>
> >> Hmm, yes, on one hand such a mapping would be created by default, as we
> >> default to "dom0-iommu=map-reserved". Otoh that mapping would be replaced
> >> before Dom0 is actually started, via the domain_creation_finished() hook.
> >> On (modern) VMX that is. So yes, on old VMX and on SVM the slot would need
> >> to remain unpopulated. Otoh, when the physical LAPICs are elsewhere and
> >> when the domain is in x2APIC mode, there would be no reason to disallow
> >> Dom0 access to that page.
> > 
> > Right, but that's now how dom0 is started ATM, as the local APIC is
> > unconditionally started in xAPIC mode and at APIC_DEFAULT_PHYS_BASE.
> > 
> > I could use vlapic_base_address() against vCPU#0 vlapic, but even in
> > guest_wrmsr_apic_base() we don't allow moving the local APIC MMIO
> > region, and hence I assumed it was fine to just use
> > APIC_DEFAULT_PHYS_BASE here.  Note in pvh_setup_acpi_madt() Xen also
> > hardcodes the local APIC address to APIC_DEFAULT_PHYS_BASE.
> > 
> > Would you be fine if I expand the comment so it's:
> > 
> >     /* If using an emulated local APIC make sure its MMIO is unpopulated. */
> >     if ( has_vlapic(d) )
> >     {
> >         /* Xen doesn't allow changing the local APIC MMIO window position. */
> >         mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE);
> >         rc |= iomem_deny_access(d, mfn, mfn);
> >     }
> 
> That will do, I think. Then:
> Acked-by: Jan Beulich <jbeulich@suse.com>

Thanks.

> >> That would apparently mean fiddling with
> >> iomem_caps once all vCPU-s have entered x2APIC mode.
> > 
> > Urg, that seems ugly.  It would also need undoing if the APICs are
> > reverted to xAPIC mode?
> 
> Right.
> 
> >> With LAPICs not
> >> normally being elsewhere, question is whether this corner case actually
> >> needs dealing with. Yet even if not, commentary may want extending, just
> >> to make clear the case was considered?
> > 
> > As said above, for both HVM and PVH Xen doesn't allow moving the APIC
> > MMIO window to anything different than APIC_DEFAULT_PHYS_BASE.
> 
> I was talking about the real one Xen uses.

Oh, OK, I was confused by the context I think, sorry then.

Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index a735e3650c28..3cda0c2c8765 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -554,6 +554,12 @@  int __init dom0_setup_permissions(struct domain *d)
         mfn = paddr_to_pfn(mp_lapic_addr);
         rc |= iomem_deny_access(d, mfn, mfn);
     }
+    /* If using an emulated local APIC make sure its MMIO is unpopulated. */
+    if ( has_vlapic(d) )
+    {
+        mfn = paddr_to_pfn(APIC_DEFAULT_PHYS_BASE);
+        rc |= iomem_deny_access(d, mfn, mfn);
+    }
     /* I/O APICs. */
     for ( i = 0; i < nr_ioapics; i++ )
     {
@@ -563,10 +569,6 @@  int __init dom0_setup_permissions(struct domain *d)
              !rangeset_contains_singleton(mmio_ro_ranges, mfn) )
             rc |= iomem_deny_access(d, mfn, mfn);
     }
-    /* MSI range. */
-    rc |= iomem_deny_access(d, paddr_to_pfn(MSI_ADDR_BASE_LO),
-                            paddr_to_pfn(MSI_ADDR_BASE_LO +
-                                         MSI_ADDR_DEST_ID_MASK));
     /* HyperTransport range. */
     if ( boot_cpu_data.x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON) )
     {