diff mbox series

[for-4.18,v3] x86/pvh: fix identity mapping of low 1MB

Message ID 20231017082907.14455-1-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show
Series [for-4.18,v3] x86/pvh: fix identity mapping of low 1MB | expand

Commit Message

Roger Pau Monné Oct. 17, 2023, 8:29 a.m. UTC
The mapping of memory regions below the 1MB mark was all done by the PVH dom0
builder code, causing the region to be avoided by the arch specific IOMMU
hardware domain initialization code.  That lead to the IOMMU being enabled
without reserved regions in the low 1MB identity mapped in the p2m for PVH
hardware domains.  Firmware which happens to be missing RMRR/IVMD ranges
describing E820 reserved regions in the low 1MB would transiently trigger IOMMU
faults until the p2m is populated by the PVH dom0 builder:

AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb380 flags 0x20 RW
AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb340 flags 0
AMD-Vi: IO_PAGE_FAULT: 0000:00:13.2 d0 addr 00000000000ea1c0 flags 0
AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb480 flags 0x20 RW
AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb080 flags 0x20 RW
AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb400 flags 0
AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb040 flags 0

Those errors have been observed on the osstest pinot{0,1} boxes (AMD Fam15h
Opteron(tm) Processor 3350 HE).

Rely on the IOMMU arch init code to create any identity mappings for reserved
regions in the low 1MB range (like it already does for reserved regions
elsewhere), and leave the mapping of any holes to be performed by the dom0
builder code.

Fixes: 6b4f6a31ace1 ('x86/PVH: de-duplicate mappings for first Mb of Dom0 memory')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v2:
 - Leave the identity mapping of holes in the low 1MB.

Changes since v1:
 - Reword commit message.
---
 xen/arch/x86/hvm/dom0_build.c       | 6 +++---
 xen/drivers/passthrough/x86/iommu.c | 8 +-------
 2 files changed, 4 insertions(+), 10 deletions(-)

Comments

Henry Wang Oct. 18, 2023, 5:09 a.m. UTC | #1
Hi Roger,

> On Oct 17, 2023, at 16:29, Roger Pau Monne <roger.pau@citrix.com> wrote:
> 
> The mapping of memory regions below the 1MB mark was all done by the PVH dom0
> builder code, causing the region to be avoided by the arch specific IOMMU
> hardware domain initialization code.  That lead to the IOMMU being enabled
> without reserved regions in the low 1MB identity mapped in the p2m for PVH
> hardware domains.  Firmware which happens to be missing RMRR/IVMD ranges
> describing E820 reserved regions in the low 1MB would transiently trigger IOMMU
> faults until the p2m is populated by the PVH dom0 builder:
> 
> AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb380 flags 0x20 RW
> AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb340 flags 0
> AMD-Vi: IO_PAGE_FAULT: 0000:00:13.2 d0 addr 00000000000ea1c0 flags 0
> AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb480 flags 0x20 RW
> AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb080 flags 0x20 RW
> AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb400 flags 0
> AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb040 flags 0
> 
> Those errors have been observed on the osstest pinot{0,1} boxes (AMD Fam15h
> Opteron(tm) Processor 3350 HE).
> 
> Rely on the IOMMU arch init code to create any identity mappings for reserved
> regions in the low 1MB range (like it already does for reserved regions
> elsewhere), and leave the mapping of any holes to be performed by the dom0
> builder code.
> 
> Fixes: 6b4f6a31ace1 ('x86/PVH: de-duplicate mappings for first Mb of Dom0 memory')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Release-acked-by: Henry Wang <Henry.Wang@arm.com>

Kind regards,
Henry
Jan Beulich Oct. 18, 2023, 3:11 p.m. UTC | #2
On 17.10.2023 10:29, Roger Pau Monne wrote:
> The mapping of memory regions below the 1MB mark was all done by the PVH dom0
> builder code, causing the region to be avoided by the arch specific IOMMU
> hardware domain initialization code.  That lead to the IOMMU being enabled
> without reserved regions in the low 1MB identity mapped in the p2m for PVH
> hardware domains.  Firmware which happens to be missing RMRR/IVMD ranges
> describing E820 reserved regions in the low 1MB would transiently trigger IOMMU
> faults until the p2m is populated by the PVH dom0 builder:
> 
> AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb380 flags 0x20 RW
> AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb340 flags 0
> AMD-Vi: IO_PAGE_FAULT: 0000:00:13.2 d0 addr 00000000000ea1c0 flags 0
> AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb480 flags 0x20 RW
> AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb080 flags 0x20 RW
> AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb400 flags 0
> AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb040 flags 0
> 
> Those errors have been observed on the osstest pinot{0,1} boxes (AMD Fam15h
> Opteron(tm) Processor 3350 HE).
> 
> Rely on the IOMMU arch init code to create any identity mappings for reserved
> regions in the low 1MB range (like it already does for reserved regions
> elsewhere), and leave the mapping of any holes to be performed by the dom0
> builder code.
> 
> Fixes: 6b4f6a31ace1 ('x86/PVH: de-duplicate mappings for first Mb of Dom0 memory')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>
with one suggestion:

> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -449,7 +449,7 @@ static int __init pvh_populate_p2m(struct domain *d)
>          }
>      }
>  
> -    /* Non-RAM regions of space below 1MB get identity mapped. */
> +    /* Identity map everything below 1MB that's not already mapped. */
>      for ( i = rc = 0; i < MB1_PAGES; ++i )
>      {
>          p2m_type_t p2mt;
> @@ -459,8 +459,8 @@ static int __init pvh_populate_p2m(struct domain *d)
>              rc = set_mmio_p2m_entry(d, _gfn(i), _mfn(i), PAGE_ORDER_4K);
>          else
>              /*
> -             * If the p2m entry is already set it must belong to a RMRR and
> -             * already be identity mapped, or be a RAM region.
> +             * If the p2m entry is already set it must belong to a RMRR/IVMD or
> +             * reserved region and be identity mapped, or else be a RAM region.
>               */
>              ASSERT(p2mt == p2m_ram_rw || mfn_eq(mfn, _mfn(i)));

Would you mind wording the comment slightly differently, e.g.

"If the p2m entry is already set it must belong to a reserved region
 (e.g. RMRR/IVMD) and be identity mapped, or else be a RAM region."

This is because such RMRR/IVMD regions are required to be in reserved
ranges anyway.

Jan
Roger Pau Monné Oct. 18, 2023, 4:12 p.m. UTC | #3
On Wed, Oct 18, 2023 at 05:11:58PM +0200, Jan Beulich wrote:
> On 17.10.2023 10:29, Roger Pau Monne wrote:
> > The mapping of memory regions below the 1MB mark was all done by the PVH dom0
> > builder code, causing the region to be avoided by the arch specific IOMMU
> > hardware domain initialization code.  That lead to the IOMMU being enabled
> > without reserved regions in the low 1MB identity mapped in the p2m for PVH
> > hardware domains.  Firmware which happens to be missing RMRR/IVMD ranges
> > describing E820 reserved regions in the low 1MB would transiently trigger IOMMU
> > faults until the p2m is populated by the PVH dom0 builder:
> > 
> > AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb380 flags 0x20 RW
> > AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb340 flags 0
> > AMD-Vi: IO_PAGE_FAULT: 0000:00:13.2 d0 addr 00000000000ea1c0 flags 0
> > AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb480 flags 0x20 RW
> > AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb080 flags 0x20 RW
> > AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb400 flags 0
> > AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb040 flags 0
> > 
> > Those errors have been observed on the osstest pinot{0,1} boxes (AMD Fam15h
> > Opteron(tm) Processor 3350 HE).
> > 
> > Rely on the IOMMU arch init code to create any identity mappings for reserved
> > regions in the low 1MB range (like it already does for reserved regions
> > elsewhere), and leave the mapping of any holes to be performed by the dom0
> > builder code.
> > 
> > Fixes: 6b4f6a31ace1 ('x86/PVH: de-duplicate mappings for first Mb of Dom0 memory')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> with one suggestion:
> 
> > --- a/xen/arch/x86/hvm/dom0_build.c
> > +++ b/xen/arch/x86/hvm/dom0_build.c
> > @@ -449,7 +449,7 @@ static int __init pvh_populate_p2m(struct domain *d)
> >          }
> >      }
> >  
> > -    /* Non-RAM regions of space below 1MB get identity mapped. */
> > +    /* Identity map everything below 1MB that's not already mapped. */
> >      for ( i = rc = 0; i < MB1_PAGES; ++i )
> >      {
> >          p2m_type_t p2mt;
> > @@ -459,8 +459,8 @@ static int __init pvh_populate_p2m(struct domain *d)
> >              rc = set_mmio_p2m_entry(d, _gfn(i), _mfn(i), PAGE_ORDER_4K);
> >          else
> >              /*
> > -             * If the p2m entry is already set it must belong to a RMRR and
> > -             * already be identity mapped, or be a RAM region.
> > +             * If the p2m entry is already set it must belong to a RMRR/IVMD or
> > +             * reserved region and be identity mapped, or else be a RAM region.
> >               */
> >              ASSERT(p2mt == p2m_ram_rw || mfn_eq(mfn, _mfn(i)));
> 
> Would you mind wording the comment slightly differently, e.g.
> 
> "If the p2m entry is already set it must belong to a reserved region
>  (e.g. RMRR/IVMD) and be identity mapped, or else be a RAM region."
> 
> This is because such RMRR/IVMD regions are required to be in reserved
> ranges anyway.

IIRC there's an option to provide extra RMRR/IVMD regions on the
command line, and those are not required to be on reserved regions?

Otherwise LGTM, so would you mind adjusting at commit?

Thanks, Roger.
Jan Beulich Oct. 19, 2023, 6:45 a.m. UTC | #4
On 18.10.2023 18:12, Roger Pau Monné wrote:
> On Wed, Oct 18, 2023 at 05:11:58PM +0200, Jan Beulich wrote:
>> On 17.10.2023 10:29, Roger Pau Monne wrote:
>>> The mapping of memory regions below the 1MB mark was all done by the PVH dom0
>>> builder code, causing the region to be avoided by the arch specific IOMMU
>>> hardware domain initialization code.  That lead to the IOMMU being enabled
>>> without reserved regions in the low 1MB identity mapped in the p2m for PVH
>>> hardware domains.  Firmware which happens to be missing RMRR/IVMD ranges
>>> describing E820 reserved regions in the low 1MB would transiently trigger IOMMU
>>> faults until the p2m is populated by the PVH dom0 builder:
>>>
>>> AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb380 flags 0x20 RW
>>> AMD-Vi: IO_PAGE_FAULT: 0000:00:13.1 d0 addr 00000000000eb340 flags 0
>>> AMD-Vi: IO_PAGE_FAULT: 0000:00:13.2 d0 addr 00000000000ea1c0 flags 0
>>> AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb480 flags 0x20 RW
>>> AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb080 flags 0x20 RW
>>> AMD-Vi: IO_PAGE_FAULT: 0000:00:14.5 d0 addr 00000000000eb400 flags 0
>>> AMD-Vi: IO_PAGE_FAULT: 0000:00:12.0 d0 addr 00000000000eb040 flags 0
>>>
>>> Those errors have been observed on the osstest pinot{0,1} boxes (AMD Fam15h
>>> Opteron(tm) Processor 3350 HE).
>>>
>>> Rely on the IOMMU arch init code to create any identity mappings for reserved
>>> regions in the low 1MB range (like it already does for reserved regions
>>> elsewhere), and leave the mapping of any holes to be performed by the dom0
>>> builder code.
>>>
>>> Fixes: 6b4f6a31ace1 ('x86/PVH: de-duplicate mappings for first Mb of Dom0 memory')
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> with one suggestion:
>>
>>> --- a/xen/arch/x86/hvm/dom0_build.c
>>> +++ b/xen/arch/x86/hvm/dom0_build.c
>>> @@ -449,7 +449,7 @@ static int __init pvh_populate_p2m(struct domain *d)
>>>          }
>>>      }
>>>  
>>> -    /* Non-RAM regions of space below 1MB get identity mapped. */
>>> +    /* Identity map everything below 1MB that's not already mapped. */
>>>      for ( i = rc = 0; i < MB1_PAGES; ++i )
>>>      {
>>>          p2m_type_t p2mt;
>>> @@ -459,8 +459,8 @@ static int __init pvh_populate_p2m(struct domain *d)
>>>              rc = set_mmio_p2m_entry(d, _gfn(i), _mfn(i), PAGE_ORDER_4K);
>>>          else
>>>              /*
>>> -             * If the p2m entry is already set it must belong to a RMRR and
>>> -             * already be identity mapped, or be a RAM region.
>>> +             * If the p2m entry is already set it must belong to a RMRR/IVMD or
>>> +             * reserved region and be identity mapped, or else be a RAM region.
>>>               */
>>>              ASSERT(p2mt == p2m_ram_rw || mfn_eq(mfn, _mfn(i)));
>>
>> Would you mind wording the comment slightly differently, e.g.
>>
>> "If the p2m entry is already set it must belong to a reserved region
>>  (e.g. RMRR/IVMD) and be identity mapped, or else be a RAM region."
>>
>> This is because such RMRR/IVMD regions are required to be in reserved
>> ranges anyway.
> 
> IIRC there's an option to provide extra RMRR/IVMD regions on the
> command line, and those are not required to be on reserved regions?

On AMD we force-reserve such regions, and I think it is a mistake that
we don't on VT-d.

> Otherwise LGTM, so would you mind adjusting at commit?

Sure.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index bc0e290db612..b8c27c1b1646 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -449,7 +449,7 @@  static int __init pvh_populate_p2m(struct domain *d)
         }
     }
 
-    /* Non-RAM regions of space below 1MB get identity mapped. */
+    /* Identity map everything below 1MB that's not already mapped. */
     for ( i = rc = 0; i < MB1_PAGES; ++i )
     {
         p2m_type_t p2mt;
@@ -459,8 +459,8 @@  static int __init pvh_populate_p2m(struct domain *d)
             rc = set_mmio_p2m_entry(d, _gfn(i), _mfn(i), PAGE_ORDER_4K);
         else
             /*
-             * If the p2m entry is already set it must belong to a RMRR and
-             * already be identity mapped, or be a RAM region.
+             * If the p2m entry is already set it must belong to a RMRR/IVMD or
+             * reserved region and be identity mapped, or else be a RAM region.
              */
             ASSERT(p2mt == p2m_ram_rw || mfn_eq(mfn, _mfn(i)));
         put_gfn(d, i);
diff --git a/xen/drivers/passthrough/x86/iommu.c b/xen/drivers/passthrough/x86/iommu.c
index c85549ccad6e..857dccb6a465 100644
--- a/xen/drivers/passthrough/x86/iommu.c
+++ b/xen/drivers/passthrough/x86/iommu.c
@@ -400,13 +400,7 @@  void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
     max_pfn = (GB(4) >> PAGE_SHIFT) - 1;
     top = max(max_pdx, pfn_to_pdx(max_pfn) + 1);
 
-    /*
-     * First Mb will get mapped in one go by pvh_populate_p2m(). Avoid
-     * setting up potentially conflicting mappings here.
-     */
-    start = paging_mode_translate(d) ? PFN_DOWN(MB(1)) : 0;
-
-    for ( i = pfn_to_pdx(start), count = 0; i < top; )
+    for ( i = 0, start = 0, count = 0; i < top; )
     {
         unsigned long pfn = pdx_to_pfn(i);
         unsigned int perms = hwdom_iommu_map(d, pfn, max_pfn);