diff mbox series

[RFC] xen/x86: allow overlaps with non-RAM regions

Message ID alpine.DEB.2.22.394.2504031755440.3529306@ubuntu-linux-20-04-desktop (mailing list archive)
State New
Headers show
Series [RFC] xen/x86: allow overlaps with non-RAM regions | expand

Commit Message

Stefano Stabellini April 4, 2025, 1:01 a.m. UTC
On one Sapphire AMD x86 board, I see this:


(XEN) [0000003943ca6ff2]  [00000000f0000000, 00000000f7ffffff] (reserved)
(XEN) [00000039460886d9]  [00000000fd000000, 00000000ffffffff] (reserved)
[...]
(XEN) [    4.612235] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position


Linux boots fine on this platform but Linux as Dom0 on Xen does not.
This is because the pci_check_bar->is_memory_hole check fails due to the
MMIO region overlapping with the EFI reserved region.

While I think ideally this should not happen, as you can imagine users
are never happy when Linux baremetal boots fine, and Linux on Xen does
not.

This patch fixes the boot issue by relaxing the is_memory_hole check.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>

Comments

Jan Beulich April 4, 2025, 8:07 a.m. UTC | #1
On 04.04.2025 03:01, Stefano Stabellini wrote:
> On one Sapphire AMD x86 board, I see this:
> 
> 
> (XEN) [0000003943ca6ff2]  [00000000f0000000, 00000000f7ffffff] (reserved)
> (XEN) [00000039460886d9]  [00000000fd000000, 00000000ffffffff] (reserved)
> [...]
> (XEN) [    4.612235] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position

I, too, see something like this on an SPR system. That's a firmware issue,
which hence first of all should be dealt with at the firmware side.

> Linux boots fine on this platform but Linux as Dom0 on Xen does not.
> This is because the pci_check_bar->is_memory_hole check fails due to the
> MMIO region overlapping with the EFI reserved region.

And then what's the actual, observable problem? On my system I haven't
noticed anything going wrong yet, albeit the affected device is also left
without a driver.

Also aiui you strictly mean PVH Dom0 here?

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -797,6 +797,9 @@ bool is_memory_hole(mfn_t start, mfn_t end)
>          if ( !entry->size )
>              continue;
>  
> +        if ( entry->type > 1 )
> +            continue;

I'm sorry to ask, but what's a literal 1 here? I'm pretty convinced we
would want to still object to overlaps with unusable ranges, for example.
Yet by open-coding what E820_RAM expands to you completely hide what this
check is about. Yes, this is an RFC, but even there such context is
important.

Furthermore my general take here is: We shouldn't simply silence issues
arising from firmware doing odd things. My take here is still the same
as the position I took when I still was maintainer of the EFI code in
Xen: We shouldn't by default work around such issues, when doing so may
negatively affect systems not exposing such odd behavior.

Finally a Misra-related observation while looking at this: Isn't
is_memory_hole() unreachable code in a !HVM configuration?

Jan
Roger Pau Monné April 4, 2025, 10:28 a.m. UTC | #2
On Thu, Apr 03, 2025 at 06:01:42PM -0700, Stefano Stabellini wrote:
> On one Sapphire AMD x86 board, I see this:
> 
> 
> (XEN) [0000003943ca6ff2]  [00000000f0000000, 00000000f7ffffff] (reserved)
> (XEN) [00000039460886d9]  [00000000fd000000, 00000000ffffffff] (reserved)
> [...]
> (XEN) [    4.612235] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position
> 
> 
> Linux boots fine on this platform but Linux as Dom0 on Xen does not.
> This is because the pci_check_bar->is_memory_hole check fails due to the
> MMIO region overlapping with the EFI reserved region.

That's weird.  (Partially) the reason to not attempt to map such BAR
is that it should already be mapped, because at dom0 creation time all
reserved regions are added to the p2m (see arch_iommu_hwdom_init()).
If that's not the case we should figure out why this reserved region
is not added to dom0 p2m as part of arch_iommu_hwdom_init().

Can you paste the dom0 build output when booted with `iommu=verbose
dom0=pvh,verbose`?  Does using `dom0=pvh,verbose,pf-fixup` solve the
boot issue? (and can you paste the output if it does)

The issue with allowing BARs to modify p2m reserved regions is that if
memory decoding is disabled for the PCI device, the BAR will be
unmapped from the p2m, thus creating a hole in the p2m for a reserved
region, which would be undesirable IMO.

Thanks, Roger.
Jason Andryuk April 10, 2025, 8:55 p.m. UTC | #3
On 2025-04-04 04:07, Jan Beulich wrote:
> On 04.04.2025 03:01, Stefano Stabellini wrote:
>> On one Sapphire AMD x86 board, I see this:
>>
>>
>> (XEN) [0000003943ca6ff2]  [00000000f0000000, 00000000f7ffffff] (reserved)
>> (XEN) [00000039460886d9]  [00000000fd000000, 00000000ffffffff] (reserved)
>> [...]
>> (XEN) [    4.612235] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position
> 
> I, too, see something like this on an SPR system. That's a firmware issue,
> which hence first of all should be dealt with at the firmware side.
> 
>> Linux boots fine on this platform but Linux as Dom0 on Xen does not.
>> This is because the pci_check_bar->is_memory_hole check fails due to the
>> MMIO region overlapping with the EFI reserved region.
> 
> And then what's the actual, observable problem? On my system I haven't
> noticed anything going wrong yet, albeit the affected device is also left
> without a driver.

The nvme drive (0000:02:00.0 with the invalid position above) fails to 
work and no root drive is available.

> Also aiui you strictly mean PVH Dom0 here?

I think we only looked at PVH Dom0.

Regards,
Jason
Jason Andryuk April 10, 2025, 8:55 p.m. UTC | #4
On 2025-04-04 06:28, Roger Pau Monné wrote:
> On Thu, Apr 03, 2025 at 06:01:42PM -0700, Stefano Stabellini wrote:
>> On one Sapphire AMD x86 board, I see this:
>>
>>
>> (XEN) [0000003943ca6ff2]  [00000000f0000000, 00000000f7ffffff] (reserved)
>> (XEN) [00000039460886d9]  [00000000fd000000, 00000000ffffffff] (reserved)
>> [...]
>> (XEN) [    4.612235] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position
>>
>>
>> Linux boots fine on this platform but Linux as Dom0 on Xen does not.
>> This is because the pci_check_bar->is_memory_hole check fails due to the
>> MMIO region overlapping with the EFI reserved region.
> 
> That's weird.  (Partially) the reason to not attempt to map such BAR
> is that it should already be mapped, because at dom0 creation time all
> reserved regions are added to the p2m (see arch_iommu_hwdom_init()).
> If that's not the case we should figure out why this reserved region
> is not added to dom0 p2m as part of arch_iommu_hwdom_init().

Victor discovered these regions are type 11 EfiMemoryMappedIO, but they 
get converted to e820 RESERVED.  The BAR points into it.

00000f0000000-00000f7ffffff type=11 attr=800000000000100d
00000fd000000-00000fedfffff type=11 attr=800000000000100d
00000fee00000-00000fee00fff type=11 attr=8000000000000001
00000fee01000-00000ffffffff type=11 attr=800000000000100d

Xenia discovered Linux keeps small regions like this reserved, but lets 
larger ones (>= 256kb) become holes.  See the comment in Linux 
arch/x86/platform/efi/efi.c:efi_remove_e820_mmio() around line 301.

The description of EfiMemoryMappedIO is a little confusing, which is 
probably why its use in unclear.

```
Table 7.5 Memory Type Usage before ExitBootServices()
EfiMemoryMappedIO

Used by system firmware to request that a memory-mapped IO region be 
mapped by the OS to a virtual address so it can be accessed by EFI 
runtime services.

Table 7.6 Memory Type Usage after ExitBootServices()
EfiMemoryMappedIO

This memory is not used by the OS. All system memory-mapped IO 
information should come from ACPI tables.
```

The two after ExitBootServices sentences seem contradictory.  I wonder 
if it should be "Ignore this memory type - All system memory-mapped IO 
information should come from ACPI tables".

> Can you paste the dom0 build output when booted with `iommu=verbose
> dom0=pvh,verbose`?  Does using `dom0=pvh,verbose,pf-fixup` solve the
> boot issue? (and can you paste the output if it does)

pf-fixup did not resolve it.  The vpci pci_check_bar() check is 
independent of pf-fixup from what I can tell.

Regards,
Jason
Roger Pau Monné April 11, 2025, 7:31 a.m. UTC | #5
Thanks Jason for getting back, I'm intrigued by this issue :).

On Thu, Apr 10, 2025 at 04:55:54PM -0400, Jason Andryuk wrote:
> On 2025-04-04 06:28, Roger Pau Monné wrote:
> > On Thu, Apr 03, 2025 at 06:01:42PM -0700, Stefano Stabellini wrote:
> > > On one Sapphire AMD x86 board, I see this:
> > > 
> > > 
> > > (XEN) [0000003943ca6ff2]  [00000000f0000000, 00000000f7ffffff] (reserved)
> > > (XEN) [00000039460886d9]  [00000000fd000000, 00000000ffffffff] (reserved)
> > > [...]
> > > (XEN) [    4.612235] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position
> > > 
> > > 
> > > Linux boots fine on this platform but Linux as Dom0 on Xen does not.
> > > This is because the pci_check_bar->is_memory_hole check fails due to the
> > > MMIO region overlapping with the EFI reserved region.
> > 
> > That's weird.  (Partially) the reason to not attempt to map such BAR
> > is that it should already be mapped, because at dom0 creation time all
> > reserved regions are added to the p2m (see arch_iommu_hwdom_init()).
> > If that's not the case we should figure out why this reserved region
> > is not added to dom0 p2m as part of arch_iommu_hwdom_init().
> 
> Victor discovered these regions are type 11 EfiMemoryMappedIO, but they get
> converted to e820 RESERVED.  The BAR points into it.
> 
> 00000f0000000-00000f7ffffff type=11 attr=800000000000100d
> 00000fd000000-00000fedfffff type=11 attr=800000000000100d
> 00000fee00000-00000fee00fff type=11 attr=8000000000000001
> 00000fee01000-00000ffffffff type=11 attr=800000000000100d
> 
> Xenia discovered Linux keeps small regions like this reserved, but lets
> larger ones (>= 256kb) become holes.  See the comment in Linux
> arch/x86/platform/efi/efi.c:efi_remove_e820_mmio() around line 301.

Right, but whatever Linux decides to do with the reserved regions
won't affect how Xen maps them into the p2m.  Anything that's reserved
in the e820 should end up identity mapped in the p2m for PVH dom0,
unless it's being exclusively used by Xen (see
dom0_setup_permissions() use of iomem_deny_access() to deny dom0
access to some MMIO regions).

> The description of EfiMemoryMappedIO is a little confusing, which is
> probably why its use in unclear.
> 
> ```
> Table 7.5 Memory Type Usage before ExitBootServices()
> EfiMemoryMappedIO
> 
> Used by system firmware to request that a memory-mapped IO region be mapped
> by the OS to a virtual address so it can be accessed by EFI runtime
> services.
> 
> Table 7.6 Memory Type Usage after ExitBootServices()
> EfiMemoryMappedIO
> 
> This memory is not used by the OS. All system memory-mapped IO information
> should come from ACPI tables.
> ```
> 
> The two after ExitBootServices sentences seem contradictory.  I wonder if it
> should be "Ignore this memory type - All system memory-mapped IO information
> should come from ACPI tables".

Not very helpful indeed.  The description in "before
ExitBootServices()" seems more sensible to me: if the MMIO region is
used by runtime services Xen should ensure it's always mapped in the
dom0 p2m (which Xen should in principle already do).

> > Can you paste the dom0 build output when booted with `iommu=verbose
> > dom0=pvh,verbose`?

Would it be possible to see the output of a debug=y build when booted
with `iommu=verbose dom0=pvh,verbose` (with or without pf-fixup,
either is fine).

I'm specially interested in the ranggeset contents printed after "d0:
identity mappings for IOMMU:", but if possible would like to see the
full boot log (including Linux dom0).

> > Does using `dom0=pvh,verbose,pf-fixup` solve the
> > boot issue? (and can you paste the output if it does)
> 
> pf-fixup did not resolve it.  The vpci pci_check_bar() check is independent
> of pf-fixup from what I can tell.

Yup, the check is independent, but pf-fixup would create additional
p2m mappings if required (note this is only available on staging).

Also, when using a build of Xen from staging you should now get
messages about unhandled memory accesses by a PVH dom0 whne not using
the `pf-fixup` option.

Thanks, Roger.
Jan Beulich April 11, 2025, 7:56 a.m. UTC | #6
On 10.04.2025 22:55, Jason Andryuk wrote:
> On 2025-04-04 04:07, Jan Beulich wrote:
>> On 04.04.2025 03:01, Stefano Stabellini wrote:
>>> On one Sapphire AMD x86 board, I see this:
>>>
>>>
>>> (XEN) [0000003943ca6ff2]  [00000000f0000000, 00000000f7ffffff] (reserved)
>>> (XEN) [00000039460886d9]  [00000000fd000000, 00000000ffffffff] (reserved)
>>> [...]
>>> (XEN) [    4.612235] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position
>>
>> I, too, see something like this on an SPR system. That's a firmware issue,
>> which hence first of all should be dealt with at the firmware side.
>>
>>> Linux boots fine on this platform but Linux as Dom0 on Xen does not.
>>> This is because the pci_check_bar->is_memory_hole check fails due to the
>>> MMIO region overlapping with the EFI reserved region.
>>
>> And then what's the actual, observable problem? On my system I haven't
>> noticed anything going wrong yet, albeit the affected device is also left
>> without a driver.
> 
> The nvme drive (0000:02:00.0 with the invalid position above) fails to 
> work and no root drive is available.

And what - if anything - does baremetal Linux say about this pretty obvious
firmware bug?

Jan
Jan Beulich April 11, 2025, 8:07 a.m. UTC | #7
On 11.04.2025 09:31, Roger Pau Monné wrote:
> Yup, the check is independent, but pf-fixup would create additional
> p2m mappings if required (note this is only available on staging).

It's also in 4.19.2 and on the 4.20 branch. 4.18 is where I couldn't
easily backport it to.

Jan
Jan Beulich April 11, 2025, 8:15 a.m. UTC | #8
On 10.04.2025 22:55, Jason Andryuk wrote:
> On 2025-04-04 06:28, Roger Pau Monné wrote:
>> On Thu, Apr 03, 2025 at 06:01:42PM -0700, Stefano Stabellini wrote:
>>> On one Sapphire AMD x86 board, I see this:
>>>
>>>
>>> (XEN) [0000003943ca6ff2]  [00000000f0000000, 00000000f7ffffff] (reserved)
>>> (XEN) [00000039460886d9]  [00000000fd000000, 00000000ffffffff] (reserved)
>>> [...]
>>> (XEN) [    4.612235] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position
>>>
>>>
>>> Linux boots fine on this platform but Linux as Dom0 on Xen does not.
>>> This is because the pci_check_bar->is_memory_hole check fails due to the
>>> MMIO region overlapping with the EFI reserved region.
>>
>> That's weird.  (Partially) the reason to not attempt to map such BAR
>> is that it should already be mapped, because at dom0 creation time all
>> reserved regions are added to the p2m (see arch_iommu_hwdom_init()).
>> If that's not the case we should figure out why this reserved region
>> is not added to dom0 p2m as part of arch_iommu_hwdom_init().
> 
> Victor discovered these regions are type 11 EfiMemoryMappedIO, but they 
> get converted to e820 RESERVED.  The BAR points into it.
> 
> 00000f0000000-00000f7ffffff type=11 attr=800000000000100d
> 00000fd000000-00000fedfffff type=11 attr=800000000000100d
> 00000fee00000-00000fee00fff type=11 attr=8000000000000001
> 00000fee01000-00000ffffffff type=11 attr=800000000000100d
> 
> Xenia discovered Linux keeps small regions like this reserved, but lets 
> larger ones (>= 256kb) become holes.  See the comment in Linux 
> arch/x86/platform/efi/efi.c:efi_remove_e820_mmio() around line 301.

What a hack. And the mentioning of MMCFG space isn't even correct there,
I think. That space may legitimately be reserved (and at least older
Linux actually checked for that), unlike any ranges where BARs may live.
As to host bridge windows - I dare to question that they always need to
be "large". Similarly nothing guarantees the some non-window space
included in _CRS may not point at a "large" region.

Jan
Jason Andryuk April 11, 2025, 1:45 p.m. UTC | #9
On 2025-04-11 03:31, Roger Pau Monné wrote:
> Thanks Jason for getting back, I'm intrigued by this issue :).
> 
> On Thu, Apr 10, 2025 at 04:55:54PM -0400, Jason Andryuk wrote:
>> On 2025-04-04 06:28, Roger Pau Monné wrote:
>>> On Thu, Apr 03, 2025 at 06:01:42PM -0700, Stefano Stabellini wrote:
>>>> On one Sapphire AMD x86 board, I see this:
>>>>
>>>>
>>>> (XEN) [0000003943ca6ff2]  [00000000f0000000, 00000000f7ffffff] (reserved)
>>>> (XEN) [00000039460886d9]  [00000000fd000000, 00000000ffffffff] (reserved)
>>>> [...]
>>>> (XEN) [    4.612235] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position
>>>>
>>>>
>>>> Linux boots fine on this platform but Linux as Dom0 on Xen does not.
>>>> This is because the pci_check_bar->is_memory_hole check fails due to the
>>>> MMIO region overlapping with the EFI reserved region.
>>>
>>> That's weird.  (Partially) the reason to not attempt to map such BAR
>>> is that it should already be mapped, because at dom0 creation time all
>>> reserved regions are added to the p2m (see arch_iommu_hwdom_init()).
>>> If that's not the case we should figure out why this reserved region
>>> is not added to dom0 p2m as part of arch_iommu_hwdom_init().
>>
>> Victor discovered these regions are type 11 EfiMemoryMappedIO, but they get
>> converted to e820 RESERVED.  The BAR points into it.
>>
>> 00000f0000000-00000f7ffffff type=11 attr=800000000000100d
>> 00000fd000000-00000fedfffff type=11 attr=800000000000100d
>> 00000fee00000-00000fee00fff type=11 attr=8000000000000001
>> 00000fee01000-00000ffffffff type=11 attr=800000000000100d
>>
>> Xenia discovered Linux keeps small regions like this reserved, but lets
>> larger ones (>= 256kb) become holes.  See the comment in Linux
>> arch/x86/platform/efi/efi.c:efi_remove_e820_mmio() around line 301.
> 
> Right, but whatever Linux decides to do with the reserved regions
> won't affect how Xen maps them into the p2m.  Anything that's reserved
> in the e820 should end up identity mapped in the p2m for PVH dom0,
> unless it's being exclusively used by Xen (see
> dom0_setup_permissions() use of iomem_deny_access() to deny dom0
> access to some MMIO regions).

Oh, my point was more that Baremetal Linux won't have reserved ranges in 
these regions, so there would not be any BAR conflicts.  Though I'm not 
sure if it checks.

If Xen removed them from the memory map, then pci_check_bar() -> 
is_memory_hole() would pass.

>> The description of EfiMemoryMappedIO is a little confusing, which is
>> probably why its use in unclear.
>>
>> ```
>> Table 7.5 Memory Type Usage before ExitBootServices()
>> EfiMemoryMappedIO
>>
>> Used by system firmware to request that a memory-mapped IO region be mapped
>> by the OS to a virtual address so it can be accessed by EFI runtime
>> services.
>>
>> Table 7.6 Memory Type Usage after ExitBootServices()
>> EfiMemoryMappedIO
>>
>> This memory is not used by the OS. All system memory-mapped IO information
>> should come from ACPI tables.
>> ```
>>
>> The two after ExitBootServices sentences seem contradictory.  I wonder if it
>> should be "Ignore this memory type - All system memory-mapped IO information
>> should come from ACPI tables".
> 
> Not very helpful indeed.  The description in "before
> ExitBootServices()" seems more sensible to me: if the MMIO region is
> used by runtime services Xen should ensure it's always mapped in the
> dom0 p2m (which Xen should in principle already do).
> 
>>> Can you paste the dom0 build output when booted with `iommu=verbose
>>> dom0=pvh,verbose`?
> 
> Would it be possible to see the output of a debug=y build when booted
> with `iommu=verbose dom0=pvh,verbose` (with or without pf-fixup,
> either is fine).
> 
> I'm specially interested in the ranggeset contents printed after "d0:
> identity mappings for IOMMU:", but if possible would like to see the
> full boot log (including Linux dom0).

Attached.

Regards,
Jason
Roger Pau Monné April 14, 2025, 8:25 a.m. UTC | #10
On Fri, Apr 11, 2025 at 09:45:26AM -0400, Jason Andryuk wrote:
> On 2025-04-11 03:31, Roger Pau Monné wrote:
> > Thanks Jason for getting back, I'm intrigued by this issue :).
> > 
> > On Thu, Apr 10, 2025 at 04:55:54PM -0400, Jason Andryuk wrote:
> > > On 2025-04-04 06:28, Roger Pau Monné wrote:
> > > > On Thu, Apr 03, 2025 at 06:01:42PM -0700, Stefano Stabellini wrote:
> > > > > On one Sapphire AMD x86 board, I see this:
> > > > > 
> > > > > 
> > > > > (XEN) [0000003943ca6ff2]  [00000000f0000000, 00000000f7ffffff] (reserved)
> > > > > (XEN) [00000039460886d9]  [00000000fd000000, 00000000ffffffff] (reserved)
> > > > > [...]
> > > > > (XEN) [    4.612235] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position
> > > > > 
> > > > > 
> > > > > Linux boots fine on this platform but Linux as Dom0 on Xen does not.
> > > > > This is because the pci_check_bar->is_memory_hole check fails due to the
> > > > > MMIO region overlapping with the EFI reserved region.
> > > > 
> > > > That's weird.  (Partially) the reason to not attempt to map such BAR
> > > > is that it should already be mapped, because at dom0 creation time all
> > > > reserved regions are added to the p2m (see arch_iommu_hwdom_init()).
> > > > If that's not the case we should figure out why this reserved region
> > > > is not added to dom0 p2m as part of arch_iommu_hwdom_init().
> > > 
> > > Victor discovered these regions are type 11 EfiMemoryMappedIO, but they get
> > > converted to e820 RESERVED.  The BAR points into it.
> > > 
> > > 00000f0000000-00000f7ffffff type=11 attr=800000000000100d
> > > 00000fd000000-00000fedfffff type=11 attr=800000000000100d
> > > 00000fee00000-00000fee00fff type=11 attr=8000000000000001
> > > 00000fee01000-00000ffffffff type=11 attr=800000000000100d
> > > 
> > > Xenia discovered Linux keeps small regions like this reserved, but lets
> > > larger ones (>= 256kb) become holes.  See the comment in Linux
> > > arch/x86/platform/efi/efi.c:efi_remove_e820_mmio() around line 301.
> > 
> > Right, but whatever Linux decides to do with the reserved regions
> > won't affect how Xen maps them into the p2m.  Anything that's reserved
> > in the e820 should end up identity mapped in the p2m for PVH dom0,
> > unless it's being exclusively used by Xen (see
> > dom0_setup_permissions() use of iomem_deny_access() to deny dom0
> > access to some MMIO regions).
> 
> Oh, my point was more that Baremetal Linux won't have reserved ranges in
> these regions, so there would not be any BAR conflicts.  Though I'm not sure
> if it checks.
> 
> If Xen removed them from the memory map, then pci_check_bar() ->
> is_memory_hole() would pass.

Yes, it would pass.  The underlying issue however is that such region
should already be mapped in the p2m, and hence accesses shouldn't
fault.

When building dom0:

(XEN) [    7.943830] *** Building a PVH Dom0 ***
(XEN) [    7.955960] d0: identity mappings for IOMMU:
(XEN) [    7.965494]  [00000000a0, 00000000ff] RW
(XEN) [    7.974336]  [0000009bff, 0000009fff] RW
(XEN) [    7.983172]  [00000cabc9, 00000cc14c] RW
(XEN) [    7.992049]  [00000cc389, 00000cc389] RW
(XEN) [    8.000890]  [00000cc70a, 00000cd1fe] RW
(XEN) [    8.010065]  [00000ce000, 00000cffff] RW
(XEN) [    8.018904]  [00000fd000, 00000fd2ff] RW
(XEN) [    8.027745]  [00000fd304, 00000febff] RW
(XEN) [    8.036584]  [00000fec02, 00000fedff] RW
(XEN) [    8.045546]  [00000fee01, 00000fffff] RW
(XEN) [    8.054519]  [000080f340, 00008501ff] RW

All the ranges listed here are added to the p2m, and hence the range
[0xfea00, 0xfea03] should be covered by:

(XEN) [    8.027745]  [00000fd304, 00000febff] RW

The expectation is that those mappings are never removed from dom0
p2m.

> > > The description of EfiMemoryMappedIO is a little confusing, which is
> > > probably why its use in unclear.
> > > 
> > > ```
> > > Table 7.5 Memory Type Usage before ExitBootServices()
> > > EfiMemoryMappedIO
> > > 
> > > Used by system firmware to request that a memory-mapped IO region be mapped
> > > by the OS to a virtual address so it can be accessed by EFI runtime
> > > services.
> > > 
> > > Table 7.6 Memory Type Usage after ExitBootServices()
> > > EfiMemoryMappedIO
> > > 
> > > This memory is not used by the OS. All system memory-mapped IO information
> > > should come from ACPI tables.
> > > ```
> > > 
> > > The two after ExitBootServices sentences seem contradictory.  I wonder if it
> > > should be "Ignore this memory type - All system memory-mapped IO information
> > > should come from ACPI tables".
> > 
> > Not very helpful indeed.  The description in "before
> > ExitBootServices()" seems more sensible to me: if the MMIO region is
> > used by runtime services Xen should ensure it's always mapped in the
> > dom0 p2m (which Xen should in principle already do).
> > 
> > > > Can you paste the dom0 build output when booted with `iommu=verbose
> > > > dom0=pvh,verbose`?
> > 
> > Would it be possible to see the output of a debug=y build when booted
> > with `iommu=verbose dom0=pvh,verbose` (with or without pf-fixup,
> > either is fine).
> > 
> > I'm specially interested in the ranggeset contents printed after "d0:
> > identity mappings for IOMMU:", but if possible would like to see the
> > full boot log (including Linux dom0).
> 
> Attached.
> 
> Regards,
> Jason

> (XEN) [00000039f8a5e9d8] Xen version 4.21-unstable (root@) (gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924) debug=y Mon Apr  7 21:19:20 UTC 2025
> (XEN) [00000039fd29ca67] Latest ChangeSet:
> (XEN) [00000039fe824ec4] build-id: 5b79e5a1e7c0b3f356c8d413924be90e6610ea62
> (XEN) [0000003a00cfcb68] Console output is synchronous.
> (XEN) [0000003a02842e16] CPU Vendor: AMD, Family 23 (0x17), Model 96 (0x60), Stepping 1 (raw 00860f01)
> (XEN) [0000003a05a066b4] BSP microcode revision: 0x0860010c
> (XEN) [0000003a077362ac] Bootloader: GRUB 2.13
> (XEN) [0000003a08e2dcdd] Command line: console=com1 com1=57600,8n1,0x3F8,4 sched=null loglvl=all guest_loglvl=all console_timestamps=boot iommu=verbose dom0=pvh,verbose,pf-fixup dom0_max_vcpus=4 dom0_mem=4G argo=1,mac-permissive=1 sync_console noreboot wow
> (XEN) [0000003a109a0624] Xen image load base address: 0xc6600000
> (XEN) [0000003a1293662b] Video information:
> (XEN) [0000003a13ebea87]  VGA is graphics mode 1920x1200, 32 bpp
> (XEN) [0000003a15e52f98] Disc information:
> (XEN) [0000003a173606a3]  Found 0 MBR signatures
> (XEN) [0000003a18b4ce18]  Found 1 EDD information structures
> (XEN) [0000003a1a8f6353] EFI RAM map:
> (XEN) [0000003a1bb9f456]  [0000000000000000, 000000000009ffff] (usable)
> (XEN) [0000003a1de8d54f]  [00000000000a0000, 00000000000fffff] (reserved)
> (XEN) [0000003a2026fad3]  [0000000000100000, 0000000009bfefff] (usable)
> (XEN) [0000003a2255f258]  [0000000009bff000, 0000000009ffffff] (reserved)
> (XEN) [0000003a249416b9]  [000000000a000000, 000000000a1fffff] (usable)
> (XEN) [0000003a26c2deaa]  [000000000a200000, 000000000a20cfff] (ACPI NVS)
> (XEN) [0000003a290124b1]  [000000000a20d000, 00000000cabc8fff] (usable)
> (XEN) [0000003a2b2ffbce]  [00000000cabc9000, 00000000cc14cfff] (reserved)
> (XEN) [0000003a2d6e36bb]  [00000000cc14d000, 00000000cc195fff] (ACPI data)
> (XEN) [0000003a2fb3feea]  [00000000cc196000, 00000000cc388fff] (ACPI NVS)
> (XEN) [0000003a31f22330]  [00000000cc389000, 00000000cc389fff] (reserved)
> (XEN) [0000003a34304f7f]  [00000000cc38a000, 00000000cc709fff] (ACPI NVS)
> (XEN) [0000003a366e8ff9]  [00000000cc70a000, 00000000cd1fefff] (reserved)
> (XEN) [0000003a38acabf9]  [00000000cd1ff000, 00000000cdffffff] (usable)
> (XEN) [0000003a3adba3f1]  [00000000ce000000, 00000000cfffffff] (reserved)
> (XEN) [0000003a3d19c902]  [00000000f0000000, 00000000f7ffffff] (reserved)
> (XEN) [0000003a3f57f2f0]  [00000000fd000000, 00000000ffffffff] (reserved)
> (XEN) [0000003a419613b2]  [0000000100000000, 000000080f33ffff] (usable)
> (XEN) [0000003a43c4eddf]  [000000080f340000, 00000008501fffff] (reserved)
[...]
> (XEN) [    7.943830] *** Building a PVH Dom0 ***
> (XEN) [    7.955960] d0: identity mappings for IOMMU:
> (XEN) [    7.965494]  [00000000a0, 00000000ff] RW
> (XEN) [    7.974336]  [0000009bff, 0000009fff] RW
> (XEN) [    7.983172]  [00000cabc9, 00000cc14c] RW
> (XEN) [    7.992049]  [00000cc389, 00000cc389] RW
> (XEN) [    8.000890]  [00000cc70a, 00000cd1fe] RW
> (XEN) [    8.010065]  [00000ce000, 00000cffff] RW
> (XEN) [    8.018904]  [00000fd000, 00000fd2ff] RW
> (XEN) [    8.027745]  [00000fd304, 00000febff] RW
> (XEN) [    8.036584]  [00000fec02, 00000fedff] RW
> (XEN) [    8.045546]  [00000fee01, 00000fffff] RW
> (XEN) [    8.054519]  [000080f340, 00008501ff] RW
> (XEN) [    8.064135] 0000:02:00.0: not mapping BAR [fea00, fea03] invalid position
> (XEN) [    8.078698] 0000:03:00.0: not mapping BAR [fe900, fe90f] invalid position
> (XEN) [    8.093260] 0000:03:00.0: not mapping BAR [fe910, fe913] invalid position
> (XEN) [    8.107815] 0000:04:00.0: not mapping BAR [fe700, fe77f] invalid position
> (XEN) [    8.122376] 0000:04:00.3: not mapping BAR [fe500, fe5ff] invalid position
> (XEN) [    8.136936] 0000:04:00.4: not mapping BAR [fe400, fe4ff] invalid position
> (XEN) [    8.151498] 0000:05:00.0: not mapping BAR [fe801, fe801] invalid position
> (XEN) [    8.166056] 0000:05:00.1: not mapping BAR [fe800, fe800] invalid position

Note those messages don't imply that the BARs are not mapped in the
dom0 p2m, for example here all the ranges listed as invalid positions
are already mapped into the p2m and covered by the range:

(XEN) [    8.027745]  [00000fd304, 00000febff] RW

> [    6.378198] nvme nvme0: pci function 0000:02:00.0
> (XEN) [   20.964789] d0v3 unable to fixup memory read from 0xfea0300c size 4: -1
> [    6.387692] a(XEN) [   20.981772] d0v3 unable to fixup memory write to 0xfea03000 size 4: -1

And here the address is somehow not populated in the p2m, despite
being listed as an identity mapped region.  I think the real issue
here is why this address is somehow unmapped from the p2m (or maybe
not even added in the first place?).  Xen does identify it as a region
that must be identity mapped.

It's a fairly wild guess, but can you try if:

https://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=e118fc98e7ae652a188d227bd7ea22f132724150

Makes a difference?  vPCI uses rangesets extensively, so the bug fixed
above could in theory cause unmap operations to remove unintended
regions, and could explain the symptoms you are seeing here.

If that commit doesn't change behavior we would need to figure out why
the identity ranges are either not properly mapped, or unexpectedly
unmapped at a later point.

Thanks, Roger.
diff mbox series

Patch

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index b294497a14..afb54d6f0f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -797,6 +797,9 @@  bool is_memory_hole(mfn_t start, mfn_t end)
         if ( !entry->size )
             continue;
 
+        if ( entry->type > 1 )
+            continue;
+
         /* Do not allow overlaps with any memory range. */
         if ( s <= PFN_DOWN(entry->addr + entry->size - 1) &&
              PFN_DOWN(entry->addr) <= e )