diff mbox series

[v2] x86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD

Message ID 20250303091908.38846-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series [v2] x86/msr: expose MSR_FAM10H_MMIO_CONF_BASE on AMD | expand

Commit Message

Roger Pau Monné March 3, 2025, 9:19 a.m. UTC
The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
Currently Linux is unconditionally attempting to read the MSR without a
safe MSR accessor, and since Xen doesn't allow access to it Linux reports
the following error:

unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
Call Trace:
 <TASK>
 ? ex_handler_msr+0x11e/0x150
 ? fixup_exception+0x81/0x300
 ? exc_general_protection+0x138/0x410
 ? asm_exc_general_protection+0x22/0x30
 ? xen_do_read_msr+0x7f/0xa0
 xen_read_msr+0x1e/0x30
 amd_get_mmconfig_range+0x2b/0x80
 quirk_amd_mmconfig_area+0x28/0x100
 ? quirk_system_pci_resources+0x2b/0x150
 pnp_fixup_device+0x39/0x50
 __pnp_add_device+0xf/0x150
 pnp_add_device+0x3d/0x100
 ? __pfx_pnpacpi_allocated_resource+0x10/0x10
 ? __pfx_pnpacpi_allocated_resource+0x10/0x10
 ? acpi_walk_resources+0xbb/0xd0
 pnpacpi_add_device_handler+0x1f9/0x280
 acpi_ns_get_device_callback+0x104/0x1c0
 ? _raw_spin_unlock_irqrestore+0x18/0x20
 ? down_timeout+0x3a/0x60
 ? _raw_spin_lock_irqsave+0x14/0x40
 acpi_ns_walk_namespace+0x1d0/0x260
 ? _raw_spin_unlock_irqrestore+0x18/0x20
 ? __pfx_acpi_ns_get_device_callback+0x10/0x10
 acpi_get_devices+0x8a/0xb0
 ? __pfx_pnpacpi_add_device_handler+0x10/0x10
 ? __pfx_pnpacpi_init+0x10/0x10
 pnpacpi_init+0x50/0x80
 do_one_initcall+0x46/0x2e0
 kernel_init_freeable+0x1da/0x2f0
 ? __pfx_kernel_init+0x10/0x10
 kernel_init+0x16/0x1b0
 ret_from_fork+0x30/0x50
 ? __pfx_kernel_init+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

Such access is conditional to the presence of a device with PnP ID
"PNP0c01", which triggers the execution of the quirk_amd_mmconfig_area()
function.  Note that prior to commit 3fac3734c43a MSR accesses when running
as a PV guest would always use the safe variant, and thus silently handle
the #GP.

Fix by allowing access to the MSR on AMD systems, returning 0 for
unprivileged domains (MMIO configuration space disabled), and the native
value for the hardware domain.

The non hardware domain logic will need to be adjusted if in the future we
expose an MCFG region to such domains.

Write attempts to the MSR will still result in #GP for all domain types.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Expand commit message to note which device triggers the MSR read.
---
 xen/arch/x86/msr.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

Comments

Andrew Cooper March 3, 2025, 1:41 p.m. UTC | #1
On 03/03/2025 9:19 am, Roger Pau Monne wrote:
> The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
> Currently Linux is unconditionally attempting to read the MSR without a
> safe MSR accessor, and since Xen doesn't allow access to it Linux reports
> the following error:
>
> unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
> Call Trace:
>  <TASK>
>  ? ex_handler_msr+0x11e/0x150
>  ? fixup_exception+0x81/0x300
>  ? exc_general_protection+0x138/0x410
>  ? asm_exc_general_protection+0x22/0x30
>  ? xen_do_read_msr+0x7f/0xa0
>  xen_read_msr+0x1e/0x30
>  amd_get_mmconfig_range+0x2b/0x80
>  quirk_amd_mmconfig_area+0x28/0x100
>  ? quirk_system_pci_resources+0x2b/0x150
>  pnp_fixup_device+0x39/0x50
>  __pnp_add_device+0xf/0x150
>  pnp_add_device+0x3d/0x100
>  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
>  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
>  ? acpi_walk_resources+0xbb/0xd0
>  pnpacpi_add_device_handler+0x1f9/0x280
>  acpi_ns_get_device_callback+0x104/0x1c0
>  ? _raw_spin_unlock_irqrestore+0x18/0x20
>  ? down_timeout+0x3a/0x60
>  ? _raw_spin_lock_irqsave+0x14/0x40
>  acpi_ns_walk_namespace+0x1d0/0x260
>  ? _raw_spin_unlock_irqrestore+0x18/0x20
>  ? __pfx_acpi_ns_get_device_callback+0x10/0x10
>  acpi_get_devices+0x8a/0xb0
>  ? __pfx_pnpacpi_add_device_handler+0x10/0x10
>  ? __pfx_pnpacpi_init+0x10/0x10
>  pnpacpi_init+0x50/0x80
>  do_one_initcall+0x46/0x2e0
>  kernel_init_freeable+0x1da/0x2f0
>  ? __pfx_kernel_init+0x10/0x10
>  kernel_init+0x16/0x1b0
>  ret_from_fork+0x30/0x50
>  ? __pfx_kernel_init+0x10/0x10
>  ret_from_fork_asm+0x1b/0x30
>  </TASK>
>
> Such access is conditional to the presence of a device with PnP ID
> "PNP0c01", which triggers the execution of the quirk_amd_mmconfig_area()
> function.  Note that prior to commit 3fac3734c43a MSR accesses when running
> as a PV guest would always use the safe variant, and thus silently handle
> the #GP.
>
> Fix by allowing access to the MSR on AMD systems, returning 0 for
> unprivileged domains (MMIO configuration space disabled), and the native
> value for the hardware domain.
>
> The non hardware domain logic will need to be adjusted if in the future we
> expose an MCFG region to such domains.
>
> Write attempts to the MSR will still result in #GP for all domain types.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Changes since v2:
>  - Expand commit message to note which device triggers the MSR read.
> ---
>  xen/arch/x86/msr.c | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
>
> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> index 1550fd9ec9f3..c1e616a3a757 100644
> --- a/xen/arch/x86/msr.c
> +++ b/xen/arch/x86/msr.c
> @@ -318,6 +318,21 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
>          *val = 0;
>          break;
>  
> +    case MSR_FAM10H_MMIO_CONF_BASE:
> +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> +            goto gp_fault;
> +
> +        /*
> +         * Report MMIO configuration space is disabled unconditionally for
> +         * domUs, as the emulated chipset doesn't support ECAM.  For dom0
> +         * return the hardware value.
> +         */
> +        *val = 0;
> +        if ( is_hardware_domain(d) && rdmsr_safe(msr, *val) )
> +            goto gp_fault;
> +
> +        break;

It doesn't matter right now, but reporting MMCFG disable is likely to
interfere with Q35 support when we do present such a range.

For PVH dom0, do we guarantee this range is identity mapped?  Or at
least, doesn't have plain RAM in?

~Andrew
Roger Pau Monné March 3, 2025, 2:13 p.m. UTC | #2
On Mon, Mar 03, 2025 at 01:41:15PM +0000, Andrew Cooper wrote:
> On 03/03/2025 9:19 am, Roger Pau Monne wrote:
> > The MMIO_CONF_BASE reports the base of the MCFG range on AMD systems.
> > Currently Linux is unconditionally attempting to read the MSR without a
> > safe MSR accessor, and since Xen doesn't allow access to it Linux reports
> > the following error:
> >
> > unchecked MSR access error: RDMSR from 0xc0010058 at rIP: 0xffffffff8101d19f (xen_do_read_msr+0x7f/0xa0)
> > Call Trace:
> >  <TASK>
> >  ? ex_handler_msr+0x11e/0x150
> >  ? fixup_exception+0x81/0x300
> >  ? exc_general_protection+0x138/0x410
> >  ? asm_exc_general_protection+0x22/0x30
> >  ? xen_do_read_msr+0x7f/0xa0
> >  xen_read_msr+0x1e/0x30
> >  amd_get_mmconfig_range+0x2b/0x80
> >  quirk_amd_mmconfig_area+0x28/0x100
> >  ? quirk_system_pci_resources+0x2b/0x150
> >  pnp_fixup_device+0x39/0x50
> >  __pnp_add_device+0xf/0x150
> >  pnp_add_device+0x3d/0x100
> >  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
> >  ? __pfx_pnpacpi_allocated_resource+0x10/0x10
> >  ? acpi_walk_resources+0xbb/0xd0
> >  pnpacpi_add_device_handler+0x1f9/0x280
> >  acpi_ns_get_device_callback+0x104/0x1c0
> >  ? _raw_spin_unlock_irqrestore+0x18/0x20
> >  ? down_timeout+0x3a/0x60
> >  ? _raw_spin_lock_irqsave+0x14/0x40
> >  acpi_ns_walk_namespace+0x1d0/0x260
> >  ? _raw_spin_unlock_irqrestore+0x18/0x20
> >  ? __pfx_acpi_ns_get_device_callback+0x10/0x10
> >  acpi_get_devices+0x8a/0xb0
> >  ? __pfx_pnpacpi_add_device_handler+0x10/0x10
> >  ? __pfx_pnpacpi_init+0x10/0x10
> >  pnpacpi_init+0x50/0x80
> >  do_one_initcall+0x46/0x2e0
> >  kernel_init_freeable+0x1da/0x2f0
> >  ? __pfx_kernel_init+0x10/0x10
> >  kernel_init+0x16/0x1b0
> >  ret_from_fork+0x30/0x50
> >  ? __pfx_kernel_init+0x10/0x10
> >  ret_from_fork_asm+0x1b/0x30
> >  </TASK>
> >
> > Such access is conditional to the presence of a device with PnP ID
> > "PNP0c01", which triggers the execution of the quirk_amd_mmconfig_area()
> > function.  Note that prior to commit 3fac3734c43a MSR accesses when running
> > as a PV guest would always use the safe variant, and thus silently handle
> > the #GP.
> >
> > Fix by allowing access to the MSR on AMD systems, returning 0 for
> > unprivileged domains (MMIO configuration space disabled), and the native
> > value for the hardware domain.
> >
> > The non hardware domain logic will need to be adjusted if in the future we
> > expose an MCFG region to such domains.
> >
> > Write attempts to the MSR will still result in #GP for all domain types.
> >
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Changes since v2:
> >  - Expand commit message to note which device triggers the MSR read.
> > ---
> >  xen/arch/x86/msr.c | 15 +++++++++++++++
> >  1 file changed, 15 insertions(+)
> >
> > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
> > index 1550fd9ec9f3..c1e616a3a757 100644
> > --- a/xen/arch/x86/msr.c
> > +++ b/xen/arch/x86/msr.c
> > @@ -318,6 +318,21 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
> >          *val = 0;
> >          break;
> >  
> > +    case MSR_FAM10H_MMIO_CONF_BASE:
> > +        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
> > +            goto gp_fault;
> > +
> > +        /*
> > +         * Report MMIO configuration space is disabled unconditionally for
> > +         * domUs, as the emulated chipset doesn't support ECAM.  For dom0
> > +         * return the hardware value.
> > +         */
> > +        *val = 0;
> > +        if ( is_hardware_domain(d) && rdmsr_safe(msr, *val) )
> > +            goto gp_fault;
> > +
> > +        break;
> 
> It doesn't matter right now, but reporting MMCFG disable is likely to
> interfere with Q35 support when we do present such a range.

Yup, that's why I mention that this will likely need  to be adjusted.
However Linux only reads MMIO_CONF_BASE if a PnP device with ID
"PNP0c01" is present, not sure whether that's (or will be the case)
with Q35 support.

> For PVH dom0, do we guarantee this range is identity mapped?  Or at
> least, doesn't have plain RAM in?

Yes, for PVH dom0 ECAM range(s) are always identity mapped.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c
index 1550fd9ec9f3..c1e616a3a757 100644
--- a/xen/arch/x86/msr.c
+++ b/xen/arch/x86/msr.c
@@ -318,6 +318,21 @@  int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val)
         *val = 0;
         break;
 
+    case MSR_FAM10H_MMIO_CONF_BASE:
+        if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) )
+            goto gp_fault;
+
+        /*
+         * Report MMIO configuration space is disabled unconditionally for
+         * domUs, as the emulated chipset doesn't support ECAM.  For dom0
+         * return the hardware value.
+         */
+        *val = 0;
+        if ( is_hardware_domain(d) && rdmsr_safe(msr, *val) )
+            goto gp_fault;
+
+        break;
+
     case MSR_VIRT_SPEC_CTRL:
         if ( !cp->extd.virt_ssbd )
             goto gp_fault;