[v2,2/2] iommu/vt-d: switch to common RMRR checker

Message ID	20240207153417.89975-3-roger.pau@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org> From: Roger Pau Monne <roger.pau@citrix.com> To: xen-devel@lists.xenproject.org Cc: Roger Pau Monne <roger.pau@citrix.com>, Kevin Tian <kevin.tian@intel.com>, Jan Beulich <jbeulich@suse.com>, Andrew Cooper <andrew.cooper3@citrix.com> Subject: [PATCH v2 2/2] iommu/vt-d: switch to common RMRR checker Date: Wed, 7 Feb 2024 16:34:17 +0100 Message-ID: <20240207153417.89975-3-roger.pau@citrix.com> In-Reply-To: <20240207153417.89975-1-roger.pau@citrix.com> References: <20240207153417.89975-1-roger.pau@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
Series	iommu/x86: unify RMRR/IVMD range checks \| expand [v2,0/2] iommu/x86: unify RMRR/IVMD range checks [v2,1/2] iommu/x86: introduce a generic IVMD/RMRR range validity helper [v2,2/2] iommu/vt-d: switch to common RMRR checker

Roger Pau Monné Feb. 7, 2024, 3:34 p.m. UTC

Use the newly introduced generic unity map checker.

Also drop the message recommending the usage of iommu_inclusive_mapping: the
ranges would end up being mapped anyway even if some of the checks above
failed, regardless of whether iommu_inclusive_mapping is set.  Plus such option
is not supported for PVH, and it's deprecated.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Adjust to changes in the previous patch.
 - Expand commit message.
---
 xen/drivers/passthrough/vtd/dmar.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

Jan Beulich Feb. 12, 2024, 2:38 p.m. UTC | #1

On 07.02.2024 16:34, Roger Pau Monne wrote:
> Use the newly introduced generic unity map checker.
> 
> Also drop the message recommending the usage of iommu_inclusive_mapping: the
> ranges would end up being mapped anyway even if some of the checks above
> failed, regardless of whether iommu_inclusive_mapping is set.  Plus such option
> is not supported for PVH, and it's deprecated.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Andrew Cooper Feb. 13, 2024, 10:37 p.m. UTC | #2

On 12/02/2024 2:38 pm, Jan Beulich wrote:
> On 07.02.2024 16:34, Roger Pau Monne wrote:
>> Use the newly introduced generic unity map checker.
>>
>> Also drop the message recommending the usage of iommu_inclusive_mapping: the
>> ranges would end up being mapped anyway even if some of the checks above
>> failed, regardless of whether iommu_inclusive_mapping is set.  Plus such option
>> is not supported for PVH, and it's deprecated.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

XenRT says no.

It's not clear exactly what's going on here, but the latest resync with
staging (covering only today's pushed changes) suffered 4 failures to
boot, on a mix of Intel hardware (SNB, SKL, SKX and CLX).

All 4 triple-fault-like things where following a log message about an RMRR:

(XEN) RMRR: [0x0e8 ,0x0e8] is not (entirely) in reserved memory

not being in reserved memory.

First of all - fix this printk() to print full addresses, not frame
numbers.  It's obnoxious to cross reference with the E820.

In the example above, 0xe8000 is regular RAM in:

(XEN)  [0000000000000000, 000000000009d3ff] (usable)

In another example,

(XEN) RMRR: [0x4d800 ,0x4ffff] is not (entirely) in reserved memory

is a hole between:

(XEN)  [000000004d3ff000, 000000004d3fffff] (usable)
(XEN)  [00000000e0000000, 00000000efffffff] (reserved)

We should also explicitly render holes when printing the E820, because
that's also unnecessarily hard to spot.

It's very likely something in this series, but the link to Intel might
just be chance of which hardware got selected, and I've got no clue why
there's a reset with no further logging out of Xen...

~Andrew

Jan Beulich Feb. 14, 2024, 7:45 a.m. UTC | #3

On 13.02.2024 23:37, Andrew Cooper wrote:
> On 12/02/2024 2:38 pm, Jan Beulich wrote:
>> On 07.02.2024 16:34, Roger Pau Monne wrote:
>>> Use the newly introduced generic unity map checker.
>>>
>>> Also drop the message recommending the usage of iommu_inclusive_mapping: the
>>> ranges would end up being mapped anyway even if some of the checks above
>>> failed, regardless of whether iommu_inclusive_mapping is set.  Plus such option
>>> is not supported for PVH, and it's deprecated.
>>>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> 
> XenRT says no.
> 
> It's not clear exactly what's going on here, but the latest resync with
> staging (covering only today's pushed changes) suffered 4 failures to
> boot, on a mix of Intel hardware (SNB, SKL, SKX and CLX).
> 
> All 4 triple-fault-like things where following a log message about an RMRR:
> 
> (XEN) RMRR: [0x0e8 ,0x0e8] is not (entirely) in reserved memory
> 
> not being in reserved memory.
> 
> 
> First of all - fix this printk() to print full addresses, not frame
> numbers.  It's obnoxious to cross reference with the E820.

Perhaps better indeed. The stray blank before the comma also wants dropping.
And while looking over the patch again, "mfn_t addr;" also isn't very
helpful - the variable would better be named mfn.

> In the example above, 0xe8000 is regular RAM in:
> 
> (XEN)  [0000000000000000, 000000000009d3ff] (usable)

Well, no, E8000 is outside of that range, and I'm inclined to guess it's
the SNB where you saw that. Iirc my SNB has such an RMRR range, too. (Or
was it the Westmere?)

> In another example,
> 
> (XEN) RMRR: [0x4d800 ,0x4ffff] is not (entirely) in reserved memory
> 
> is a hole between:
> 
> (XEN)  [000000004d3ff000, 000000004d3fffff] (usable)
> (XEN)  [00000000e0000000, 00000000efffffff] (reserved)
> 
> We should also explicitly render holes when printing the E820, because
> that's also unnecessarily hard to spot.

I disagree here - both "ends" of a hole are easily visible from the
neighboring ranges.

> It's very likely something in this series, but the link to Intel might
> just be chance of which hardware got selected, and I've got no clue why
> there's a reset with no further logging out of Xen...

I second this - even after looking closely at the patches again, I can't
make a connection between them and the observed behavior. Didn't yet look
at what, if anything, osstest may have to say. Do I understand correctly
that the cited log messages are the last sign of life prior to the
systems rebooting?

Jan

Roger Pau Monné Feb. 14, 2024, 8:45 a.m. UTC | #4

On Wed, Feb 14, 2024 at 08:45:28AM +0100, Jan Beulich wrote:
> On 13.02.2024 23:37, Andrew Cooper wrote:
> > On 12/02/2024 2:38 pm, Jan Beulich wrote:
> >> On 07.02.2024 16:34, Roger Pau Monne wrote:
> >>> Use the newly introduced generic unity map checker.
> >>>
> >>> Also drop the message recommending the usage of iommu_inclusive_mapping: the
> >>> ranges would end up being mapped anyway even if some of the checks above
> >>> failed, regardless of whether iommu_inclusive_mapping is set.  Plus such option
> >>> is not supported for PVH, and it's deprecated.
> >>>
> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> > 
> > XenRT says no.
> > 
> > It's not clear exactly what's going on here, but the latest resync with
> > staging (covering only today's pushed changes) suffered 4 failures to
> > boot, on a mix of Intel hardware (SNB, SKL, SKX and CLX).
> > 
> > All 4 triple-fault-like things where following a log message about an RMRR:
> > 
> > (XEN) RMRR: [0x0e8 ,0x0e8] is not (entirely) in reserved memory
> > 
> > not being in reserved memory.
> > 
> > 
> > First of all - fix this printk() to print full addresses, not frame
> > numbers.  It's obnoxious to cross reference with the E820.
> 
> Perhaps better indeed. The stray blank before the comma also wants dropping.
> And while looking over the patch again, "mfn_t addr;" also isn't very
> helpful - the variable would better be named mfn.

I can adjust those in the fix, see below.

> > It's very likely something in this series, but the link to Intel might
> > just be chance of which hardware got selected, and I've got no clue why
> > there's a reset with no further logging out of Xen...
> 
> I second this - even after looking closely at the patches again, I can't
> make a connection between them and the observed behavior. Didn't yet look
> at what, if anything, osstest may have to say. Do I understand correctly
> that the cited log messages are the last sign of life prior to the
> systems rebooting?

I've found it:

    for ( addr = start; mfn_x(addr) <= mfn_x(end); mfn_add(addr, 1) )

Should be:

    for ( addr = start; mfn_x(addr) <= mfn_x(end); addr = mfn_add(addr, 1) )

mfn_add() doesn't modify the parameter.  Will see about making those
helpers __must_check in order to avoid this happening in the future.

Andrew Cooper Feb. 14, 2024, 9 a.m. UTC | #5

On 14/02/2024 8:45 am, Roger Pau Monné wrote:
> On Wed, Feb 14, 2024 at 08:45:28AM +0100, Jan Beulich wrote:
>> On 13.02.2024 23:37, Andrew Cooper wrote:
>>> On 12/02/2024 2:38 pm, Jan Beulich wrote:
>>>> On 07.02.2024 16:34, Roger Pau Monne wrote:
>>>>> Use the newly introduced generic unity map checker.
>>>>>
>>>>> Also drop the message recommending the usage of iommu_inclusive_mapping: the
>>>>> ranges would end up being mapped anyway even if some of the checks above
>>>>> failed, regardless of whether iommu_inclusive_mapping is set.  Plus such option
>>>>> is not supported for PVH, and it's deprecated.
>>>>>
>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>>> XenRT says no.
>>>
>>> It's not clear exactly what's going on here, but the latest resync with
>>> staging (covering only today's pushed changes) suffered 4 failures to
>>> boot, on a mix of Intel hardware (SNB, SKL, SKX and CLX).
>>>
>>> All 4 triple-fault-like things where following a log message about an RMRR:
>>>
>>> (XEN) RMRR: [0x0e8 ,0x0e8] is not (entirely) in reserved memory
>>>
>>> not being in reserved memory.
>>>
>>>
>>> First of all - fix this printk() to print full addresses, not frame
>>> numbers.  It's obnoxious to cross reference with the E820.
>> Perhaps better indeed. The stray blank before the comma also wants dropping.
>> And while looking over the patch again, "mfn_t addr;" also isn't very
>> helpful - the variable would better be named mfn.
> I can adjust those in the fix, see below.
>
>>> It's very likely something in this series, but the link to Intel might
>>> just be chance of which hardware got selected, and I've got no clue why
>>> there's a reset with no further logging out of Xen...
>> I second this - even after looking closely at the patches again, I can't
>> make a connection between them and the observed behavior. Didn't yet look
>> at what, if anything, osstest may have to say. Do I understand correctly
>> that the cited log messages are the last sign of life prior to the
>> systems rebooting?
> I've found it:
>
>     for ( addr = start; mfn_x(addr) <= mfn_x(end); mfn_add(addr, 1) )
>
> Should be:
>
>     for ( addr = start; mfn_x(addr) <= mfn_x(end); addr = mfn_add(addr, 1) )
>
> mfn_add() doesn't modify the parameter.  Will see about making those
> helpers __must_check in order to avoid this happening in the future.

There's only a single thing in this function which wants an mfn_t. 
Everything else is operating on raw paddr_t's.  I'd suggest converting
types at the start and using plain numbers.

Also, while I hate to nitpick, iommu_unity_region_ok() really ought to
be iommu_check_unity_region().  It is not a predicate (given the
additional fixups), so the function name shouldn't read as one.

Also, the "not (entirely) in reserved memory" line ought to have an ";
adjusting" on the end to make it clear that it's making an adjustment in
light of finding the range not reserved.

Finally, the "can't be converted" error should render type, even if only
in numeric form.

What do we do when there's a region that's marked as RAM?

As to the triple-fault-like nature, given that it's an infinite loop, I
expect that it was our test automation getting unhappy and power cycling
the systems after seeing no signs of starting the installer.

~Andrew

Jan Beulich Feb. 14, 2024, 9:01 a.m. UTC | #6

On 14.02.2024 09:45, Roger Pau Monné wrote:
> On Wed, Feb 14, 2024 at 08:45:28AM +0100, Jan Beulich wrote:
>> On 13.02.2024 23:37, Andrew Cooper wrote:
>>> It's very likely something in this series, but the link to Intel might
>>> just be chance of which hardware got selected, and I've got no clue why
>>> there's a reset with no further logging out of Xen...
>>
>> I second this - even after looking closely at the patches again, I can't
>> make a connection between them and the observed behavior. Didn't yet look
>> at what, if anything, osstest may have to say. Do I understand correctly
>> that the cited log messages are the last sign of life prior to the
>> systems rebooting?
> 
> I've found it:
> 
>     for ( addr = start; mfn_x(addr) <= mfn_x(end); mfn_add(addr, 1) )
> 
> Should be:
> 
>     for ( addr = start; mfn_x(addr) <= mfn_x(end); addr = mfn_add(addr, 1) )
> 
> mfn_add() doesn't modify the parameter.  Will see about making those
> helpers __must_check in order to avoid this happening in the future.

Hmm, yes, it's not the first time this has happened. But even seeing the
flaw I still can't explain the observed behavior: The system ought to
hang then, not instantly reboot?

Jan

Roger Pau Monné Feb. 14, 2024, 9:15 a.m. UTC | #7

On Wed, Feb 14, 2024 at 10:01:43AM +0100, Jan Beulich wrote:
> On 14.02.2024 09:45, Roger Pau Monné wrote:
> > On Wed, Feb 14, 2024 at 08:45:28AM +0100, Jan Beulich wrote:
> >> On 13.02.2024 23:37, Andrew Cooper wrote:
> >>> It's very likely something in this series, but the link to Intel might
> >>> just be chance of which hardware got selected, and I've got no clue why
> >>> there's a reset with no further logging out of Xen...
> >>
> >> I second this - even after looking closely at the patches again, I can't
> >> make a connection between them and the observed behavior. Didn't yet look
> >> at what, if anything, osstest may have to say. Do I understand correctly
> >> that the cited log messages are the last sign of life prior to the
> >> systems rebooting?
> > 
> > I've found it:
> > 
> >     for ( addr = start; mfn_x(addr) <= mfn_x(end); mfn_add(addr, 1) )
> > 
> > Should be:
> > 
> >     for ( addr = start; mfn_x(addr) <= mfn_x(end); addr = mfn_add(addr, 1) )
> > 
> > mfn_add() doesn't modify the parameter.  Will see about making those
> > helpers __must_check in order to avoid this happening in the future.
> 
> Hmm, yes, it's not the first time this has happened. But even seeing the
> flaw I still can't explain the observed behavior: The system ought to
> hang then, not instantly reboot?

AFAICT, it was stuck in a loop without making progress until the CI
controller decided to reboot it.

Thanks, Roger.

Roger Pau Monné Feb. 14, 2024, 9:24 a.m. UTC | #8

On Wed, Feb 14, 2024 at 09:00:25AM +0000, Andrew Cooper wrote:
> On 14/02/2024 8:45 am, Roger Pau Monné wrote:
> > On Wed, Feb 14, 2024 at 08:45:28AM +0100, Jan Beulich wrote:
> >> On 13.02.2024 23:37, Andrew Cooper wrote:
> >>> On 12/02/2024 2:38 pm, Jan Beulich wrote:
> >>>> On 07.02.2024 16:34, Roger Pau Monne wrote:
> >>>>> Use the newly introduced generic unity map checker.
> >>>>>
> >>>>> Also drop the message recommending the usage of iommu_inclusive_mapping: the
> >>>>> ranges would end up being mapped anyway even if some of the checks above
> >>>>> failed, regardless of whether iommu_inclusive_mapping is set.  Plus such option
> >>>>> is not supported for PVH, and it's deprecated.
> >>>>>
> >>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> >>> XenRT says no.
> >>>
> >>> It's not clear exactly what's going on here, but the latest resync with
> >>> staging (covering only today's pushed changes) suffered 4 failures to
> >>> boot, on a mix of Intel hardware (SNB, SKL, SKX and CLX).
> >>>
> >>> All 4 triple-fault-like things where following a log message about an RMRR:
> >>>
> >>> (XEN) RMRR: [0x0e8 ,0x0e8] is not (entirely) in reserved memory
> >>>
> >>> not being in reserved memory.
> >>>
> >>>
> >>> First of all - fix this printk() to print full addresses, not frame
> >>> numbers.  It's obnoxious to cross reference with the E820.
> >> Perhaps better indeed. The stray blank before the comma also wants dropping.
> >> And while looking over the patch again, "mfn_t addr;" also isn't very
> >> helpful - the variable would better be named mfn.
> > I can adjust those in the fix, see below.
> >
> >>> It's very likely something in this series, but the link to Intel might
> >>> just be chance of which hardware got selected, and I've got no clue why
> >>> there's a reset with no further logging out of Xen...
> >> I second this - even after looking closely at the patches again, I can't
> >> make a connection between them and the observed behavior. Didn't yet look
> >> at what, if anything, osstest may have to say. Do I understand correctly
> >> that the cited log messages are the last sign of life prior to the
> >> systems rebooting?
> > I've found it:
> >
> >     for ( addr = start; mfn_x(addr) <= mfn_x(end); mfn_add(addr, 1) )
> >
> > Should be:
> >
> >     for ( addr = start; mfn_x(addr) <= mfn_x(end); addr = mfn_add(addr, 1) )
> >
> > mfn_add() doesn't modify the parameter.  Will see about making those
> > helpers __must_check in order to avoid this happening in the future.
> 
> There's only a single thing in this function which wants an mfn_t. 
> Everything else is operating on raw paddr_t's.  I'd suggest converting
> types at the start and using plain numbers.

I don't have a strong opinion, can do (but then likely as a followup
patch).

> Also, while I hate to nitpick, iommu_unity_region_ok() really ought to
> be iommu_check_unity_region().  It is not a predicate (given the
> additional fixups), so the function name shouldn't read as one.

I'm afraid those two read the same to me.  I can change, but I don't
see how the additional fixups modify how the function should be
named.

> Also, the "not (entirely) in reserved memory" line ought to have an ";
> adjusting" on the end to make it clear that it's making an adjustment in
> light of finding the range not reserved.
> 
> Finally, the "can't be converted" error should render type, even if only
> in numeric form.

This was all inherited from the previous IVMD code.

> What do we do when there's a region that's marked as RAM?

We fail to initialize the IOMMU, which is what we did previously.  In
v1 of this series there was a further patch that would panic Xen if
such overlap was found.  That however raises the question if we need
to parse IVMD/RMRR regions even when the IOMMU is disabled, so that
the panic would also be triggered even when not using the IOMMU (as
the device will still be accessing the regions in the RMRR/IVMD
ranges).

Thanks, Roger.

Andrew Cooper Feb. 14, 2024, 10:11 a.m. UTC | #9

On 14/02/2024 8:45 am, Roger Pau Monné wrote:
> I've found it:
>
>     for ( addr = start; mfn_x(addr) <= mfn_x(end); mfn_add(addr, 1) )
>
> Should be:
>
>     for ( addr = start; mfn_x(addr) <= mfn_x(end); addr = mfn_add(addr, 1) )

Coverity did end up spotting this.

> New defect(s) Reported-by: Coverity Scan
> Showing 1 of 1 defect(s)
>
>
> ** CID 1592056:  Incorrect expression  (USELESS_CALL)
>
>
> ________________________________________________________________________________________________________
> *** CID 1592056:  Incorrect expression  (USELESS_CALL)
> /xen/drivers/passthrough/x86/iommu.c: 807 in iommu_unity_region_ok()
> 801             return true;
> 802     
> 803         printk(XENLOG_WARNING
> 804                "%s: [%#" PRI_mfn " ,%#" PRI_mfn "] is not (entirely) in reserved memory\n",
> 805                prefix, mfn_x(start), mfn_x(end));
> 806     
>>>>     CID 1592056:  Incorrect expression  (USELESS_CALL)
>>>>     Calling "mfn_add(addr, 1UL)" is only useful for its return value, which is ignored.
> 807         for ( addr = start; mfn_x(addr) <= mfn_x(end); mfn_add(addr, 1) )
> 808         {
> 809             unsigned int type = page_get_ram_type(addr);
> 810     
> 811             if ( type == RAM_TYPE_UNKNOWN )
> 812             {


~Andrew

[v2,2/2] iommu/vt-d: switch to common RMRR checker

Commit Message

Comments

Patch