diff mbox series

iommu/amd-vi: do not error if device referenced in IVMD is not behind any IOMMU

Message ID 20241008104706.74001-1-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series iommu/amd-vi: do not error if device referenced in IVMD is not behind any IOMMU | expand

Commit Message

Roger Pau Monné Oct. 8, 2024, 10:47 a.m. UTC
IVMD table contains restrictions about memory which must be mandatory assigned
to devices (and which permissions it should use), or memory that should be
never accessible to devices.

Some hardware however contains ranges in IVMD that reference devices outside of
the IVHD tables (in other words, devices not behind any IOMMU).  Such mismatch
will cause Xen to fail in register_range_for_device(), ultimately leading to
the IOMMU being disabled, and Xen crashing as x2APIC support might be already
enabled and relying on the IOMMU functionality.

Relax IVMD parsing: allow IVMD blocks to reference devices not assigned to any
IOMMU.  It's impossible for Xen to fulfill the requirement in the IVMD block if
the device is not behind any IOMMU, but it's no worse than booting without
IOMMU support, and thus not parsing ACPI IVRS in the first place.

Reported-by: Willi Junga <xenproject@ymy.be>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/drivers/passthrough/amd/iommu_acpi.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Jan Beulich Oct. 8, 2024, 2:01 p.m. UTC | #1
On 08.10.2024 12:47, Roger Pau Monne wrote:
> IVMD table contains restrictions about memory which must be mandatory assigned
> to devices (and which permissions it should use), or memory that should be
> never accessible to devices.
> 
> Some hardware however contains ranges in IVMD that reference devices outside of
> the IVHD tables (in other words, devices not behind any IOMMU).  Such mismatch
> will cause Xen to fail in register_range_for_device(), ultimately leading to
> the IOMMU being disabled, and Xen crashing as x2APIC support might be already
> enabled and relying on the IOMMU functionality.

I find it hard to believe that on x86 systems with IOMMUs some devices would
be left uncovered by any IOMMU. Is it possible that IVHD is flawed there? In
which case we might rightfully refuse to boot? (Can you share e.g. that
"iommu=debug" output that results from parsing the tables on that system?)

> Relax IVMD parsing: allow IVMD blocks to reference devices not assigned to any
> IOMMU.  It's impossible for Xen to fulfill the requirement in the IVMD block if
> the device is not behind any IOMMU, but it's no worse than booting without
> IOMMU support, and thus not parsing ACPI IVRS in the first place.
> 
> Reported-by: Willi Junga <xenproject@ymy.be>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
>  xen/drivers/passthrough/amd/iommu_acpi.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c b/xen/drivers/passthrough/amd/iommu_acpi.c
> index 3f5508eba049..c416120326c9 100644
> --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> @@ -248,8 +248,9 @@ static int __init register_range_for_device(
>      iommu = find_iommu_for_device(seg, bdf);
>      if ( !iommu )
>      {
> -        AMD_IOMMU_ERROR("IVMD: no IOMMU for Dev_Id %#x\n", bdf);
> -        return -ENODEV;
> +        AMD_IOMMU_WARN("IVMD: no IOMMU for device %pp - ignoring constrain\n",

I'm not a native speaker, but "constrain" to me can only be a verb (with
"constraint" being the noun). IOW as worded I'm afraid I can't make sense
of the message.

> +                       &PCI_SBDF(seg, bdf));
> +        return 0;
>      }
>      req = ivrs_mappings[bdf].dte_requestor_id;
>  

Down from here in parse_ivmd_device_iommu() is somewhat similar code.
Wouldn't that need adjusting similarly then? Or else shouldn't the
adjustment above be accompanied by a comment clarifying that the
behavior is just because of observations on certain hardware?

Jan
Roger Pau Monné Oct. 9, 2024, 8:03 a.m. UTC | #2
On Tue, Oct 08, 2024 at 04:01:28PM +0200, Jan Beulich wrote:
> On 08.10.2024 12:47, Roger Pau Monne wrote:
> > IVMD table contains restrictions about memory which must be mandatory assigned
> > to devices (and which permissions it should use), or memory that should be
> > never accessible to devices.
> > 
> > Some hardware however contains ranges in IVMD that reference devices outside of
> > the IVHD tables (in other words, devices not behind any IOMMU).  Such mismatch
> > will cause Xen to fail in register_range_for_device(), ultimately leading to
> > the IOMMU being disabled, and Xen crashing as x2APIC support might be already
> > enabled and relying on the IOMMU functionality.
> 
> I find it hard to believe that on x86 systems with IOMMUs some devices would
> be left uncovered by any IOMMU. Is it possible that IVHD is flawed there? In
> which case we might rightfully refuse to boot? (Can you share e.g. that
> "iommu=debug" output that results from parsing the tables on that system?)

I'm afraid I don't have any of such systems to test myself, however I
have the contents of IVRS:

  ACPI Table Header
------------------------------------------------------------------
Signature          : IVRS
Length             : 0x000001F8
Revision           : 0x02
Checksum           : 0x06
OEM ID             : AMD  
OEM Table ID       : AmdTable
OEM Revision       : 0x00000001
Creator ID         : AMD 
Creator Revision   : 0x00000001
IVinfo             : 0x00203043
	  IVHD
	----------------------------------------------------------------
	Type                  : 0x10
	Flags                 : 0xB0
	Length                : 0x0044
	IOMMU Device ID       : 0x0002
	Capability Offset     : 0x0040
	IOMMU Base Address    : 0x00000000FD200000
	Segment Group         : 0x0000
	IOMMU Info            : 0x0000
	IOMMU Feature Info    : 0x80048F6E
		  Range
		--------------------------------------------------
		Type                  : 0x03
		Start of Range        : 0x0003
		End of Range          : 0xFFFE
		DTE Setting           : 0x00
		  Alias Range
		--------------------------------------------------
		Type                  : 0x43
		Start of Range        : 0xFF00
		End of Range          : 0xFFFF
		DTE Setting           : 0x00
		Source Device ID      : 0x00A5
		  Special Device
		--------------------------------------------------
		Type                  : 0x48
		Device ID             : 0x0000
		DTE Setting           : 0x00
		Source Device ID      : 0x00A0
		Handle                : 0x00
		Variety               : HPET
		  Special Device
		--------------------------------------------------
		Type                  : 0x48
		Device ID             : 0x0000
		DTE Setting           : 0xD7
		Source Device ID      : 0x00A0
		Handle                : 0x21
		Variety               : IOAPIC
		  Special Device
		--------------------------------------------------
		Type                  : 0x48
		Device ID             : 0x0000
		DTE Setting           : 0x00
		Source Device ID      : 0x0001
		Handle                : 0x22
		Variety               : IOAPIC
	  IVHD
	----------------------------------------------------------------
	Type                  : 0x11
	Flags                 : 0x30
	Length                : 0x0054
	IOMMU Device ID       : 0x0002
	Capability Offset     : 0x0040
	IOMMU Base Address    : 0x00000000FD200000
	Segment Group         : 0x0000
	IOMMU Info            : 0x0000
	IOMMU Feature Info    : 0x00048000
		  Range
		--------------------------------------------------
		Type                  : 0x03
		Start of Range        : 0x0003
		End of Range          : 0xFFFE
		DTE Setting           : 0x00
		  Alias Range
		--------------------------------------------------
		Type                  : 0x43
		Start of Range        : 0xFF00
		End of Range          : 0xFFFF
		DTE Setting           : 0x00
		Source Device ID      : 0x00A5
		  Special Device
		--------------------------------------------------
		Type                  : 0x48
		Device ID             : 0x0000
		DTE Setting           : 0x00
		Source Device ID      : 0x00A0
		Handle                : 0x00
		Variety               : HPET
		  Special Device
		--------------------------------------------------
		Type                  : 0x48
		Device ID             : 0x0000
		DTE Setting           : 0xD7
		Source Device ID      : 0x00A0
		Handle                : 0x21
		Variety               : IOAPIC
		  Special Device
		--------------------------------------------------
		Type                  : 0x48
		Device ID             : 0x0000
		DTE Setting           : 0x00
		Source Device ID      : 0x0001
		Handle                : 0x22
		Variety               : IOAPIC
	  IVMD
	----------------------------------------------------------------
	Type                                 : 0x22
	Flags                                : 0x08
	Length                               : 0x0020
	DeviceID                             : 0x0000
	AuxiliaryData                        : 0x0FFF
	Reserved                             : 0x0000000000000000
	IVMD Start Address                   : 0x0000000096191000
	IVMD Memory Block Length             : 0x0000000000000022
	  IVMD
	----------------------------------------------------------------
	Type                                 : 0x22
	Flags                                : 0x08
	Length                               : 0x0020
	DeviceID                             : 0x0000
	AuxiliaryData                        : 0x0FFF
	Reserved                             : 0x0000000000000000
	IVMD Start Address                   : 0x0000000097D9E000
	IVMD Memory Block Length             : 0x0000000000000022
	  IVMD
	----------------------------------------------------------------
	Type                                 : 0x22
	Flags                                : 0x08
	Length                               : 0x0020
	DeviceID                             : 0x0000
	AuxiliaryData                        : 0x0FFF
	Reserved                             : 0x0000000000000000
	IVMD Start Address                   : 0x0000000097D9D000
	IVMD Memory Block Length             : 0x0000000000000022
	  IVHD
	----------------------------------------------------------------
	Type                  : 0x40
	Flags                 : 0x30
	Length                : 0x00D0
	IOMMU Device ID       : 0x0002
	Capability Offset     : 0x0040
	IOMMU Base Address    : 0x00000000FD200000
	Segment Group         : 0x0000
	IOMMU Info            : 0x0000
	IOMMU Feature Info    : 0x00048000
		  Range
		--------------------------------------------------
		Type                  : 0x03
		Start of Range        : 0x0003
		End of Range          : 0xFFFE
		DTE Setting           : 0x00
		  Alias Range
		--------------------------------------------------
		Type                  : 0x43
		Start of Range        : 0xFF00
		End of Range          : 0xFFFF
		DTE Setting           : 0x00
		Source Device ID      : 0x00A5
		  Special Device
		--------------------------------------------------
		Type                  : 0x48
		Device ID             : 0x0000
		DTE Setting           : 0x00
		Source Device ID      : 0x00A0
		Handle                : 0x00
		Variety               : HPET
		  Special Device
		--------------------------------------------------
		Type                  : 0x48
		Device ID             : 0x0000
		DTE Setting           : 0xD7
		Source Device ID      : 0x00A0
		Handle                : 0x21
		Variety               : IOAPIC
		  Special Device
		--------------------------------------------------
		Type                  : 0x48
		Device ID             : 0x0000
		DTE Setting           : 0x00
		Source Device ID      : 0x0001
		Handle                : 0x22
		Variety               : IOAPIC
		  Variable Length ACPI HID Device
		--------------------------------------------------
		Type                  : 0xF0
		Device ID             : 0x00A5
		DTE Setting           : 0x40
		Hardware ID           : AMDI0020
		Extended DTE Setting  : 
		Unique ID Format      : 2
		Unique ID Length      : 9
		Unique ID             : \_SB.FUR0
		  Variable Length ACPI HID Device
		--------------------------------------------------
		Type                  : 0xF0
		Device ID             : 0x00A5
		DTE Setting           : 0x40
		Hardware ID           : AMDI0020
		Extended DTE Setting  : 
		Unique ID Format      : 2
		Unique ID Length      : 9
		Unique ID             : \_SB.FUR1
		  Variable Length ACPI HID Device
		--------------------------------------------------
		Type                  : 0xF0
		Device ID             : 0x00A5
		DTE Setting           : 0x40
		Hardware ID           : AMDI0020
		Extended DTE Setting  : 
		Unique ID Format      : 2
		Unique ID Length      : 9
		Unique ID             : \_SB.FUR2
		  Variable Length ACPI HID Device
		--------------------------------------------------
		Type                  : 0xF0
		Device ID             : 0x00A5
		DTE Setting           : 0x40
		Hardware ID           : AMDI0020
		Extended DTE Setting  : 
		Unique ID Format      : 2
		Unique ID Length      : 9
		Unique ID             : \_SB.FUR3

FWIW, I've checked on one of the AMD server systems we have on the
lab, and the IVHD entries are fairly similar to the ones here, as
neither the PCI Host Bridge, nor the IOMMU are covered by any IVHD
block.  That system however doesn't have any IVMD blocks.

> > Relax IVMD parsing: allow IVMD blocks to reference devices not assigned to any
> > IOMMU.  It's impossible for Xen to fulfill the requirement in the IVMD block if
> > the device is not behind any IOMMU, but it's no worse than booting without
> > IOMMU support, and thus not parsing ACPI IVRS in the first place.
> > 
> > Reported-by: Willi Junga <xenproject@ymy.be>
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> >  xen/drivers/passthrough/amd/iommu_acpi.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c b/xen/drivers/passthrough/amd/iommu_acpi.c
> > index 3f5508eba049..c416120326c9 100644
> > --- a/xen/drivers/passthrough/amd/iommu_acpi.c
> > +++ b/xen/drivers/passthrough/amd/iommu_acpi.c
> > @@ -248,8 +248,9 @@ static int __init register_range_for_device(
> >      iommu = find_iommu_for_device(seg, bdf);
> >      if ( !iommu )
> >      {
> > -        AMD_IOMMU_ERROR("IVMD: no IOMMU for Dev_Id %#x\n", bdf);
> > -        return -ENODEV;
> > +        AMD_IOMMU_WARN("IVMD: no IOMMU for device %pp - ignoring constrain\n",
> 
> I'm not a native speaker, but "constrain" to me can only be a verb (with
> "constraint" being the noun). IOW as worded I'm afraid I can't make sense
> of the message.

Indeed, sorry for the typo.

> > +                       &PCI_SBDF(seg, bdf));
> > +        return 0;
> >      }
> >      req = ivrs_mappings[bdf].dte_requestor_id;
> >  
> 
> Down from here in parse_ivmd_device_iommu() is somewhat similar code.
> Wouldn't that need adjusting similarly then? Or else shouldn't the
> adjustment above be accompanied by a comment clarifying that the
> behavior is just because of observations on certain hardware?

Hm, I think that one is bogus and should be removed, according to my
copy of the AMD-Vi spec (48882—Rev 3.08-PUB—Oct 2023), the IVMD type
can only be:

20h=all peripherals
21h=specified peripheral
22h=peripheral range

So type 23h (ACPI_IVRS_TYPE_MEMORY_IOMMU) is not a valid type for the
IVMD blocks.

Thanks, Roger.
Jan Beulich Oct. 9, 2024, 10:52 a.m. UTC | #3
On 09.10.2024 10:03, Roger Pau Monné wrote:
> On Tue, Oct 08, 2024 at 04:01:28PM +0200, Jan Beulich wrote:
>> On 08.10.2024 12:47, Roger Pau Monne wrote:
>>> IVMD table contains restrictions about memory which must be mandatory assigned
>>> to devices (and which permissions it should use), or memory that should be
>>> never accessible to devices.
>>>
>>> Some hardware however contains ranges in IVMD that reference devices outside of
>>> the IVHD tables (in other words, devices not behind any IOMMU).  Such mismatch
>>> will cause Xen to fail in register_range_for_device(), ultimately leading to
>>> the IOMMU being disabled, and Xen crashing as x2APIC support might be already
>>> enabled and relying on the IOMMU functionality.
>>
>> I find it hard to believe that on x86 systems with IOMMUs some devices would
>> be left uncovered by any IOMMU. Is it possible that IVHD is flawed there? In
>> which case we might rightfully refuse to boot? (Can you share e.g. that
>> "iommu=debug" output that results from parsing the tables on that system?)
> 
> I'm afraid I don't have any of such systems to test myself, however I
> have the contents of IVRS:
> 
>   ACPI Table Header
> ------------------------------------------------------------------
> Signature          : IVRS
> Length             : 0x000001F8
> Revision           : 0x02
> Checksum           : 0x06
> OEM ID             : AMD  
> OEM Table ID       : AmdTable
> OEM Revision       : 0x00000001
> Creator ID         : AMD 
> Creator Revision   : 0x00000001
> IVinfo             : 0x00203043
> 	  IVHD
> 	----------------------------------------------------------------
> 	Type                  : 0x10
> 	Flags                 : 0xB0
> 	Length                : 0x0044
> 	IOMMU Device ID       : 0x0002
> 	Capability Offset     : 0x0040
> 	IOMMU Base Address    : 0x00000000FD200000
> 	Segment Group         : 0x0000
> 	IOMMU Info            : 0x0000
> 	IOMMU Feature Info    : 0x80048F6E
> 		  Range
> 		--------------------------------------------------
> 		Type                  : 0x03
> 		Start of Range        : 0x0003
> 		End of Range          : 0xFFFE
> 		DTE Setting           : 0x00
> 		  Alias Range
> 		--------------------------------------------------
> 		Type                  : 0x43
> 		Start of Range        : 0xFF00
> 		End of Range          : 0xFFFF
> 		DTE Setting           : 0x00
> 		Source Device ID      : 0x00A5
> 		  Special Device
> 		--------------------------------------------------
> 		Type                  : 0x48
> 		Device ID             : 0x0000
> 		DTE Setting           : 0x00
> 		Source Device ID      : 0x00A0
> 		Handle                : 0x00
> 		Variety               : HPET
> 		  Special Device
> 		--------------------------------------------------
> 		Type                  : 0x48
> 		Device ID             : 0x0000
> 		DTE Setting           : 0xD7
> 		Source Device ID      : 0x00A0
> 		Handle                : 0x21
> 		Variety               : IOAPIC
> 		  Special Device
> 		--------------------------------------------------
> 		Type                  : 0x48
> 		Device ID             : 0x0000
> 		DTE Setting           : 0x00
> 		Source Device ID      : 0x0001
> 		Handle                : 0x22
> 		Variety               : IOAPIC
> 	  IVHD
> 	----------------------------------------------------------------
> 	Type                  : 0x11
> 	Flags                 : 0x30
> 	Length                : 0x0054
> 	IOMMU Device ID       : 0x0002
> 	Capability Offset     : 0x0040
> 	IOMMU Base Address    : 0x00000000FD200000
> 	Segment Group         : 0x0000
> 	IOMMU Info            : 0x0000
> 	IOMMU Feature Info    : 0x00048000
> 		  Range
> 		--------------------------------------------------
> 		Type                  : 0x03
> 		Start of Range        : 0x0003
> 		End of Range          : 0xFFFE
> 		DTE Setting           : 0x00
> 		  Alias Range
> 		--------------------------------------------------
> 		Type                  : 0x43
> 		Start of Range        : 0xFF00
> 		End of Range          : 0xFFFF
> 		DTE Setting           : 0x00
> 		Source Device ID      : 0x00A5
> 		  Special Device
> 		--------------------------------------------------
> 		Type                  : 0x48
> 		Device ID             : 0x0000
> 		DTE Setting           : 0x00
> 		Source Device ID      : 0x00A0
> 		Handle                : 0x00
> 		Variety               : HPET
> 		  Special Device
> 		--------------------------------------------------
> 		Type                  : 0x48
> 		Device ID             : 0x0000
> 		DTE Setting           : 0xD7
> 		Source Device ID      : 0x00A0
> 		Handle                : 0x21
> 		Variety               : IOAPIC
> 		  Special Device
> 		--------------------------------------------------
> 		Type                  : 0x48
> 		Device ID             : 0x0000
> 		DTE Setting           : 0x00
> 		Source Device ID      : 0x0001
> 		Handle                : 0x22
> 		Variety               : IOAPIC
> 	  IVMD
> 	----------------------------------------------------------------
> 	Type                                 : 0x22
> 	Flags                                : 0x08
> 	Length                               : 0x0020
> 	DeviceID                             : 0x0000
> 	AuxiliaryData                        : 0x0FFF
> 	Reserved                             : 0x0000000000000000
> 	IVMD Start Address                   : 0x0000000096191000
> 	IVMD Memory Block Length             : 0x0000000000000022
> 	  IVMD
> 	----------------------------------------------------------------
> 	Type                                 : 0x22
> 	Flags                                : 0x08
> 	Length                               : 0x0020
> 	DeviceID                             : 0x0000
> 	AuxiliaryData                        : 0x0FFF
> 	Reserved                             : 0x0000000000000000
> 	IVMD Start Address                   : 0x0000000097D9E000
> 	IVMD Memory Block Length             : 0x0000000000000022
> 	  IVMD
> 	----------------------------------------------------------------
> 	Type                                 : 0x22
> 	Flags                                : 0x08
> 	Length                               : 0x0020
> 	DeviceID                             : 0x0000
> 	AuxiliaryData                        : 0x0FFF
> 	Reserved                             : 0x0000000000000000
> 	IVMD Start Address                   : 0x0000000097D9D000
> 	IVMD Memory Block Length             : 0x0000000000000022
> 	  IVHD
> 	----------------------------------------------------------------
> 	Type                  : 0x40
> 	Flags                 : 0x30
> 	Length                : 0x00D0
> 	IOMMU Device ID       : 0x0002
> 	Capability Offset     : 0x0040
> 	IOMMU Base Address    : 0x00000000FD200000
> 	Segment Group         : 0x0000
> 	IOMMU Info            : 0x0000
> 	IOMMU Feature Info    : 0x00048000
> 		  Range
> 		--------------------------------------------------
> 		Type                  : 0x03
> 		Start of Range        : 0x0003
> 		End of Range          : 0xFFFE
> 		DTE Setting           : 0x00
> 		  Alias Range
> 		--------------------------------------------------
> 		Type                  : 0x43
> 		Start of Range        : 0xFF00
> 		End of Range          : 0xFFFF
> 		DTE Setting           : 0x00
> 		Source Device ID      : 0x00A5
> 		  Special Device
> 		--------------------------------------------------
> 		Type                  : 0x48
> 		Device ID             : 0x0000
> 		DTE Setting           : 0x00
> 		Source Device ID      : 0x00A0
> 		Handle                : 0x00
> 		Variety               : HPET
> 		  Special Device
> 		--------------------------------------------------
> 		Type                  : 0x48
> 		Device ID             : 0x0000
> 		DTE Setting           : 0xD7
> 		Source Device ID      : 0x00A0
> 		Handle                : 0x21
> 		Variety               : IOAPIC
> 		  Special Device
> 		--------------------------------------------------
> 		Type                  : 0x48
> 		Device ID             : 0x0000
> 		DTE Setting           : 0x00
> 		Source Device ID      : 0x0001
> 		Handle                : 0x22
> 		Variety               : IOAPIC
> 		  Variable Length ACPI HID Device
> 		--------------------------------------------------
> 		Type                  : 0xF0
> 		Device ID             : 0x00A5
> 		DTE Setting           : 0x40
> 		Hardware ID           : AMDI0020
> 		Extended DTE Setting  : 
> 		Unique ID Format      : 2
> 		Unique ID Length      : 9
> 		Unique ID             : \_SB.FUR0
> 		  Variable Length ACPI HID Device
> 		--------------------------------------------------
> 		Type                  : 0xF0
> 		Device ID             : 0x00A5
> 		DTE Setting           : 0x40
> 		Hardware ID           : AMDI0020
> 		Extended DTE Setting  : 
> 		Unique ID Format      : 2
> 		Unique ID Length      : 9
> 		Unique ID             : \_SB.FUR1
> 		  Variable Length ACPI HID Device
> 		--------------------------------------------------
> 		Type                  : 0xF0
> 		Device ID             : 0x00A5
> 		DTE Setting           : 0x40
> 		Hardware ID           : AMDI0020
> 		Extended DTE Setting  : 
> 		Unique ID Format      : 2
> 		Unique ID Length      : 9
> 		Unique ID             : \_SB.FUR2
> 		  Variable Length ACPI HID Device
> 		--------------------------------------------------
> 		Type                  : 0xF0
> 		Device ID             : 0x00A5
> 		DTE Setting           : 0x40
> 		Hardware ID           : AMDI0020
> 		Extended DTE Setting  : 
> 		Unique ID Format      : 2
> 		Unique ID Length      : 9
> 		Unique ID             : \_SB.FUR3
> 
> FWIW, I've checked on one of the AMD server systems we have on the
> lab, and the IVHD entries are fairly similar to the ones here, as
> neither the PCI Host Bridge, nor the IOMMU are covered by any IVHD
> block.  That system however doesn't have any IVMD blocks.

Mine are a little different. The Dinar (Fam15) has an IVHD entry just
for the range 0-2 (host bridge, <nothing>, IOMMU). The Rome (Fam17)
has an IVHD entry just for 0 (host bridge), but not for the IOMMU. I
think it is entirely reasonable for host bridge(s) and IOMMU(s) to not
be covered by any IVHD. They aren't devices that would require
servicing by an IOMMU.

Looking at the code I think we want to do things a little differently
though: Pull find_iommu_for_device() out of register_range_for_device()
and have parse_ivmd_device_range() do the skipping when there's no
IOMMU for a device. Plus error when no device in the range is covered
by an IOMMU, or if any two devices are covered by different IOMMUs.

Jan
Roger Pau Monné Oct. 9, 2024, 11:13 a.m. UTC | #4
On Wed, Oct 09, 2024 at 12:52:29PM +0200, Jan Beulich wrote:
> On 09.10.2024 10:03, Roger Pau Monné wrote:
> > On Tue, Oct 08, 2024 at 04:01:28PM +0200, Jan Beulich wrote:
> >> On 08.10.2024 12:47, Roger Pau Monne wrote:
> >>> IVMD table contains restrictions about memory which must be mandatory assigned
> >>> to devices (and which permissions it should use), or memory that should be
> >>> never accessible to devices.
> >>>
> >>> Some hardware however contains ranges in IVMD that reference devices outside of
> >>> the IVHD tables (in other words, devices not behind any IOMMU).  Such mismatch
> >>> will cause Xen to fail in register_range_for_device(), ultimately leading to
> >>> the IOMMU being disabled, and Xen crashing as x2APIC support might be already
> >>> enabled and relying on the IOMMU functionality.
> >>
> >> I find it hard to believe that on x86 systems with IOMMUs some devices would
> >> be left uncovered by any IOMMU. Is it possible that IVHD is flawed there? In
> >> which case we might rightfully refuse to boot? (Can you share e.g. that
> >> "iommu=debug" output that results from parsing the tables on that system?)
> > 
> > I'm afraid I don't have any of such systems to test myself, however I
> > have the contents of IVRS:
> > 
> >   ACPI Table Header
> > ------------------------------------------------------------------
> > Signature          : IVRS
> > Length             : 0x000001F8
> > Revision           : 0x02
> > Checksum           : 0x06
> > OEM ID             : AMD  
> > OEM Table ID       : AmdTable
> > OEM Revision       : 0x00000001
> > Creator ID         : AMD 
> > Creator Revision   : 0x00000001
> > IVinfo             : 0x00203043
> > 	  IVHD
> > 	----------------------------------------------------------------
> > 	Type                  : 0x10
> > 	Flags                 : 0xB0
> > 	Length                : 0x0044
> > 	IOMMU Device ID       : 0x0002
> > 	Capability Offset     : 0x0040
> > 	IOMMU Base Address    : 0x00000000FD200000
> > 	Segment Group         : 0x0000
> > 	IOMMU Info            : 0x0000
> > 	IOMMU Feature Info    : 0x80048F6E
> > 		  Range
> > 		--------------------------------------------------
> > 		Type                  : 0x03
> > 		Start of Range        : 0x0003
> > 		End of Range          : 0xFFFE
> > 		DTE Setting           : 0x00
> > 		  Alias Range
> > 		--------------------------------------------------
> > 		Type                  : 0x43
> > 		Start of Range        : 0xFF00
> > 		End of Range          : 0xFFFF
> > 		DTE Setting           : 0x00
> > 		Source Device ID      : 0x00A5
> > 		  Special Device
> > 		--------------------------------------------------
> > 		Type                  : 0x48
> > 		Device ID             : 0x0000
> > 		DTE Setting           : 0x00
> > 		Source Device ID      : 0x00A0
> > 		Handle                : 0x00
> > 		Variety               : HPET
> > 		  Special Device
> > 		--------------------------------------------------
> > 		Type                  : 0x48
> > 		Device ID             : 0x0000
> > 		DTE Setting           : 0xD7
> > 		Source Device ID      : 0x00A0
> > 		Handle                : 0x21
> > 		Variety               : IOAPIC
> > 		  Special Device
> > 		--------------------------------------------------
> > 		Type                  : 0x48
> > 		Device ID             : 0x0000
> > 		DTE Setting           : 0x00
> > 		Source Device ID      : 0x0001
> > 		Handle                : 0x22
> > 		Variety               : IOAPIC
> > 	  IVHD
> > 	----------------------------------------------------------------
> > 	Type                  : 0x11
> > 	Flags                 : 0x30
> > 	Length                : 0x0054
> > 	IOMMU Device ID       : 0x0002
> > 	Capability Offset     : 0x0040
> > 	IOMMU Base Address    : 0x00000000FD200000
> > 	Segment Group         : 0x0000
> > 	IOMMU Info            : 0x0000
> > 	IOMMU Feature Info    : 0x00048000
> > 		  Range
> > 		--------------------------------------------------
> > 		Type                  : 0x03
> > 		Start of Range        : 0x0003
> > 		End of Range          : 0xFFFE
> > 		DTE Setting           : 0x00
> > 		  Alias Range
> > 		--------------------------------------------------
> > 		Type                  : 0x43
> > 		Start of Range        : 0xFF00
> > 		End of Range          : 0xFFFF
> > 		DTE Setting           : 0x00
> > 		Source Device ID      : 0x00A5
> > 		  Special Device
> > 		--------------------------------------------------
> > 		Type                  : 0x48
> > 		Device ID             : 0x0000
> > 		DTE Setting           : 0x00
> > 		Source Device ID      : 0x00A0
> > 		Handle                : 0x00
> > 		Variety               : HPET
> > 		  Special Device
> > 		--------------------------------------------------
> > 		Type                  : 0x48
> > 		Device ID             : 0x0000
> > 		DTE Setting           : 0xD7
> > 		Source Device ID      : 0x00A0
> > 		Handle                : 0x21
> > 		Variety               : IOAPIC
> > 		  Special Device
> > 		--------------------------------------------------
> > 		Type                  : 0x48
> > 		Device ID             : 0x0000
> > 		DTE Setting           : 0x00
> > 		Source Device ID      : 0x0001
> > 		Handle                : 0x22
> > 		Variety               : IOAPIC
> > 	  IVMD
> > 	----------------------------------------------------------------
> > 	Type                                 : 0x22
> > 	Flags                                : 0x08
> > 	Length                               : 0x0020
> > 	DeviceID                             : 0x0000
> > 	AuxiliaryData                        : 0x0FFF
> > 	Reserved                             : 0x0000000000000000
> > 	IVMD Start Address                   : 0x0000000096191000
> > 	IVMD Memory Block Length             : 0x0000000000000022
> > 	  IVMD
> > 	----------------------------------------------------------------
> > 	Type                                 : 0x22
> > 	Flags                                : 0x08
> > 	Length                               : 0x0020
> > 	DeviceID                             : 0x0000
> > 	AuxiliaryData                        : 0x0FFF
> > 	Reserved                             : 0x0000000000000000
> > 	IVMD Start Address                   : 0x0000000097D9E000
> > 	IVMD Memory Block Length             : 0x0000000000000022
> > 	  IVMD
> > 	----------------------------------------------------------------
> > 	Type                                 : 0x22
> > 	Flags                                : 0x08
> > 	Length                               : 0x0020
> > 	DeviceID                             : 0x0000
> > 	AuxiliaryData                        : 0x0FFF
> > 	Reserved                             : 0x0000000000000000
> > 	IVMD Start Address                   : 0x0000000097D9D000
> > 	IVMD Memory Block Length             : 0x0000000000000022
> > 	  IVHD
> > 	----------------------------------------------------------------
> > 	Type                  : 0x40
> > 	Flags                 : 0x30
> > 	Length                : 0x00D0
> > 	IOMMU Device ID       : 0x0002
> > 	Capability Offset     : 0x0040
> > 	IOMMU Base Address    : 0x00000000FD200000
> > 	Segment Group         : 0x0000
> > 	IOMMU Info            : 0x0000
> > 	IOMMU Feature Info    : 0x00048000
> > 		  Range
> > 		--------------------------------------------------
> > 		Type                  : 0x03
> > 		Start of Range        : 0x0003
> > 		End of Range          : 0xFFFE
> > 		DTE Setting           : 0x00
> > 		  Alias Range
> > 		--------------------------------------------------
> > 		Type                  : 0x43
> > 		Start of Range        : 0xFF00
> > 		End of Range          : 0xFFFF
> > 		DTE Setting           : 0x00
> > 		Source Device ID      : 0x00A5
> > 		  Special Device
> > 		--------------------------------------------------
> > 		Type                  : 0x48
> > 		Device ID             : 0x0000
> > 		DTE Setting           : 0x00
> > 		Source Device ID      : 0x00A0
> > 		Handle                : 0x00
> > 		Variety               : HPET
> > 		  Special Device
> > 		--------------------------------------------------
> > 		Type                  : 0x48
> > 		Device ID             : 0x0000
> > 		DTE Setting           : 0xD7
> > 		Source Device ID      : 0x00A0
> > 		Handle                : 0x21
> > 		Variety               : IOAPIC
> > 		  Special Device
> > 		--------------------------------------------------
> > 		Type                  : 0x48
> > 		Device ID             : 0x0000
> > 		DTE Setting           : 0x00
> > 		Source Device ID      : 0x0001
> > 		Handle                : 0x22
> > 		Variety               : IOAPIC
> > 		  Variable Length ACPI HID Device
> > 		--------------------------------------------------
> > 		Type                  : 0xF0
> > 		Device ID             : 0x00A5
> > 		DTE Setting           : 0x40
> > 		Hardware ID           : AMDI0020
> > 		Extended DTE Setting  : 
> > 		Unique ID Format      : 2
> > 		Unique ID Length      : 9
> > 		Unique ID             : \_SB.FUR0
> > 		  Variable Length ACPI HID Device
> > 		--------------------------------------------------
> > 		Type                  : 0xF0
> > 		Device ID             : 0x00A5
> > 		DTE Setting           : 0x40
> > 		Hardware ID           : AMDI0020
> > 		Extended DTE Setting  : 
> > 		Unique ID Format      : 2
> > 		Unique ID Length      : 9
> > 		Unique ID             : \_SB.FUR1
> > 		  Variable Length ACPI HID Device
> > 		--------------------------------------------------
> > 		Type                  : 0xF0
> > 		Device ID             : 0x00A5
> > 		DTE Setting           : 0x40
> > 		Hardware ID           : AMDI0020
> > 		Extended DTE Setting  : 
> > 		Unique ID Format      : 2
> > 		Unique ID Length      : 9
> > 		Unique ID             : \_SB.FUR2
> > 		  Variable Length ACPI HID Device
> > 		--------------------------------------------------
> > 		Type                  : 0xF0
> > 		Device ID             : 0x00A5
> > 		DTE Setting           : 0x40
> > 		Hardware ID           : AMDI0020
> > 		Extended DTE Setting  : 
> > 		Unique ID Format      : 2
> > 		Unique ID Length      : 9
> > 		Unique ID             : \_SB.FUR3
> > 
> > FWIW, I've checked on one of the AMD server systems we have on the
> > lab, and the IVHD entries are fairly similar to the ones here, as
> > neither the PCI Host Bridge, nor the IOMMU are covered by any IVHD
> > block.  That system however doesn't have any IVMD blocks.
> 
> Mine are a little different. The Dinar (Fam15) has an IVHD entry just
> for the range 0-2 (host bridge, <nothing>, IOMMU). The Rome (Fam17)
> has an IVHD entry just for 0 (host bridge), but not for the IOMMU. I
> think it is entirely reasonable for host bridge(s) and IOMMU(s) to not
> be covered by any IVHD. They aren't devices that would require
> servicing by an IOMMU.
> 
> Looking at the code I think we want to do things a little differently
> though: Pull find_iommu_for_device() out of register_range_for_device()
> and have parse_ivmd_device_range() do the skipping when there's no
> IOMMU for a device.

What about parse_ivmd_device_select()?  The IOMMU check would also need
to be duplicated there, which is not ideal IMO.

> Plus error when no device in the range is covered
> by an IOMMU, or if any two devices are covered by different IOMMUs.

I'm not sure I understand you last comment: do you mean to return an
error if a IVMD block range covers devices assigned to different
IOMMUs?  If that's the case, I'm afraid I don't agree, I don't see
anywhere in the spec that notes a IVMD block range can apply to
devices assigned to different IOMMUs.

I also think returning an error when no device in the IVMD range is
covered by an IOMMU is dubious.  Xen will already print warning
messages about such firmware inconsistencies, but refusing to boot is
too strict.

Thanks, Roger.
Jan Beulich Oct. 9, 2024, 11:28 a.m. UTC | #5
On 09.10.2024 13:13, Roger Pau Monné wrote:
> On Wed, Oct 09, 2024 at 12:52:29PM +0200, Jan Beulich wrote:
>> On 09.10.2024 10:03, Roger Pau Monné wrote:
>>> On Tue, Oct 08, 2024 at 04:01:28PM +0200, Jan Beulich wrote:
>>>> On 08.10.2024 12:47, Roger Pau Monne wrote:
>>>>> IVMD table contains restrictions about memory which must be mandatory assigned
>>>>> to devices (and which permissions it should use), or memory that should be
>>>>> never accessible to devices.
>>>>>
>>>>> Some hardware however contains ranges in IVMD that reference devices outside of
>>>>> the IVHD tables (in other words, devices not behind any IOMMU).  Such mismatch
>>>>> will cause Xen to fail in register_range_for_device(), ultimately leading to
>>>>> the IOMMU being disabled, and Xen crashing as x2APIC support might be already
>>>>> enabled and relying on the IOMMU functionality.
>>>>
>>>> I find it hard to believe that on x86 systems with IOMMUs some devices would
>>>> be left uncovered by any IOMMU. Is it possible that IVHD is flawed there? In
>>>> which case we might rightfully refuse to boot? (Can you share e.g. that
>>>> "iommu=debug" output that results from parsing the tables on that system?)
>>>
>>> I'm afraid I don't have any of such systems to test myself, however I
>>> have the contents of IVRS:
>>>
>>>   ACPI Table Header
>>> ------------------------------------------------------------------
>>> Signature          : IVRS
>>> Length             : 0x000001F8
>>> Revision           : 0x02
>>> Checksum           : 0x06
>>> OEM ID             : AMD  
>>> OEM Table ID       : AmdTable
>>> OEM Revision       : 0x00000001
>>> Creator ID         : AMD 
>>> Creator Revision   : 0x00000001
>>> IVinfo             : 0x00203043
>>> 	  IVHD
>>> 	----------------------------------------------------------------
>>> 	Type                  : 0x10
>>> 	Flags                 : 0xB0
>>> 	Length                : 0x0044
>>> 	IOMMU Device ID       : 0x0002
>>> 	Capability Offset     : 0x0040
>>> 	IOMMU Base Address    : 0x00000000FD200000
>>> 	Segment Group         : 0x0000
>>> 	IOMMU Info            : 0x0000
>>> 	IOMMU Feature Info    : 0x80048F6E
>>> 		  Range
>>> 		--------------------------------------------------
>>> 		Type                  : 0x03
>>> 		Start of Range        : 0x0003
>>> 		End of Range          : 0xFFFE
>>> 		DTE Setting           : 0x00
>>> 		  Alias Range
>>> 		--------------------------------------------------
>>> 		Type                  : 0x43
>>> 		Start of Range        : 0xFF00
>>> 		End of Range          : 0xFFFF
>>> 		DTE Setting           : 0x00
>>> 		Source Device ID      : 0x00A5
>>> 		  Special Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0x48
>>> 		Device ID             : 0x0000
>>> 		DTE Setting           : 0x00
>>> 		Source Device ID      : 0x00A0
>>> 		Handle                : 0x00
>>> 		Variety               : HPET
>>> 		  Special Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0x48
>>> 		Device ID             : 0x0000
>>> 		DTE Setting           : 0xD7
>>> 		Source Device ID      : 0x00A0
>>> 		Handle                : 0x21
>>> 		Variety               : IOAPIC
>>> 		  Special Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0x48
>>> 		Device ID             : 0x0000
>>> 		DTE Setting           : 0x00
>>> 		Source Device ID      : 0x0001
>>> 		Handle                : 0x22
>>> 		Variety               : IOAPIC
>>> 	  IVHD
>>> 	----------------------------------------------------------------
>>> 	Type                  : 0x11
>>> 	Flags                 : 0x30
>>> 	Length                : 0x0054
>>> 	IOMMU Device ID       : 0x0002
>>> 	Capability Offset     : 0x0040
>>> 	IOMMU Base Address    : 0x00000000FD200000
>>> 	Segment Group         : 0x0000
>>> 	IOMMU Info            : 0x0000
>>> 	IOMMU Feature Info    : 0x00048000
>>> 		  Range
>>> 		--------------------------------------------------
>>> 		Type                  : 0x03
>>> 		Start of Range        : 0x0003
>>> 		End of Range          : 0xFFFE
>>> 		DTE Setting           : 0x00
>>> 		  Alias Range
>>> 		--------------------------------------------------
>>> 		Type                  : 0x43
>>> 		Start of Range        : 0xFF00
>>> 		End of Range          : 0xFFFF
>>> 		DTE Setting           : 0x00
>>> 		Source Device ID      : 0x00A5
>>> 		  Special Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0x48
>>> 		Device ID             : 0x0000
>>> 		DTE Setting           : 0x00
>>> 		Source Device ID      : 0x00A0
>>> 		Handle                : 0x00
>>> 		Variety               : HPET
>>> 		  Special Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0x48
>>> 		Device ID             : 0x0000
>>> 		DTE Setting           : 0xD7
>>> 		Source Device ID      : 0x00A0
>>> 		Handle                : 0x21
>>> 		Variety               : IOAPIC
>>> 		  Special Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0x48
>>> 		Device ID             : 0x0000
>>> 		DTE Setting           : 0x00
>>> 		Source Device ID      : 0x0001
>>> 		Handle                : 0x22
>>> 		Variety               : IOAPIC
>>> 	  IVMD
>>> 	----------------------------------------------------------------
>>> 	Type                                 : 0x22
>>> 	Flags                                : 0x08
>>> 	Length                               : 0x0020
>>> 	DeviceID                             : 0x0000
>>> 	AuxiliaryData                        : 0x0FFF
>>> 	Reserved                             : 0x0000000000000000
>>> 	IVMD Start Address                   : 0x0000000096191000
>>> 	IVMD Memory Block Length             : 0x0000000000000022
>>> 	  IVMD
>>> 	----------------------------------------------------------------
>>> 	Type                                 : 0x22
>>> 	Flags                                : 0x08
>>> 	Length                               : 0x0020
>>> 	DeviceID                             : 0x0000
>>> 	AuxiliaryData                        : 0x0FFF
>>> 	Reserved                             : 0x0000000000000000
>>> 	IVMD Start Address                   : 0x0000000097D9E000
>>> 	IVMD Memory Block Length             : 0x0000000000000022
>>> 	  IVMD
>>> 	----------------------------------------------------------------
>>> 	Type                                 : 0x22
>>> 	Flags                                : 0x08
>>> 	Length                               : 0x0020
>>> 	DeviceID                             : 0x0000
>>> 	AuxiliaryData                        : 0x0FFF
>>> 	Reserved                             : 0x0000000000000000
>>> 	IVMD Start Address                   : 0x0000000097D9D000
>>> 	IVMD Memory Block Length             : 0x0000000000000022
>>> 	  IVHD
>>> 	----------------------------------------------------------------
>>> 	Type                  : 0x40
>>> 	Flags                 : 0x30
>>> 	Length                : 0x00D0
>>> 	IOMMU Device ID       : 0x0002
>>> 	Capability Offset     : 0x0040
>>> 	IOMMU Base Address    : 0x00000000FD200000
>>> 	Segment Group         : 0x0000
>>> 	IOMMU Info            : 0x0000
>>> 	IOMMU Feature Info    : 0x00048000
>>> 		  Range
>>> 		--------------------------------------------------
>>> 		Type                  : 0x03
>>> 		Start of Range        : 0x0003
>>> 		End of Range          : 0xFFFE
>>> 		DTE Setting           : 0x00
>>> 		  Alias Range
>>> 		--------------------------------------------------
>>> 		Type                  : 0x43
>>> 		Start of Range        : 0xFF00
>>> 		End of Range          : 0xFFFF
>>> 		DTE Setting           : 0x00
>>> 		Source Device ID      : 0x00A5
>>> 		  Special Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0x48
>>> 		Device ID             : 0x0000
>>> 		DTE Setting           : 0x00
>>> 		Source Device ID      : 0x00A0
>>> 		Handle                : 0x00
>>> 		Variety               : HPET
>>> 		  Special Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0x48
>>> 		Device ID             : 0x0000
>>> 		DTE Setting           : 0xD7
>>> 		Source Device ID      : 0x00A0
>>> 		Handle                : 0x21
>>> 		Variety               : IOAPIC
>>> 		  Special Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0x48
>>> 		Device ID             : 0x0000
>>> 		DTE Setting           : 0x00
>>> 		Source Device ID      : 0x0001
>>> 		Handle                : 0x22
>>> 		Variety               : IOAPIC
>>> 		  Variable Length ACPI HID Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0xF0
>>> 		Device ID             : 0x00A5
>>> 		DTE Setting           : 0x40
>>> 		Hardware ID           : AMDI0020
>>> 		Extended DTE Setting  : 
>>> 		Unique ID Format      : 2
>>> 		Unique ID Length      : 9
>>> 		Unique ID             : \_SB.FUR0
>>> 		  Variable Length ACPI HID Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0xF0
>>> 		Device ID             : 0x00A5
>>> 		DTE Setting           : 0x40
>>> 		Hardware ID           : AMDI0020
>>> 		Extended DTE Setting  : 
>>> 		Unique ID Format      : 2
>>> 		Unique ID Length      : 9
>>> 		Unique ID             : \_SB.FUR1
>>> 		  Variable Length ACPI HID Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0xF0
>>> 		Device ID             : 0x00A5
>>> 		DTE Setting           : 0x40
>>> 		Hardware ID           : AMDI0020
>>> 		Extended DTE Setting  : 
>>> 		Unique ID Format      : 2
>>> 		Unique ID Length      : 9
>>> 		Unique ID             : \_SB.FUR2
>>> 		  Variable Length ACPI HID Device
>>> 		--------------------------------------------------
>>> 		Type                  : 0xF0
>>> 		Device ID             : 0x00A5
>>> 		DTE Setting           : 0x40
>>> 		Hardware ID           : AMDI0020
>>> 		Extended DTE Setting  : 
>>> 		Unique ID Format      : 2
>>> 		Unique ID Length      : 9
>>> 		Unique ID             : \_SB.FUR3
>>>
>>> FWIW, I've checked on one of the AMD server systems we have on the
>>> lab, and the IVHD entries are fairly similar to the ones here, as
>>> neither the PCI Host Bridge, nor the IOMMU are covered by any IVHD
>>> block.  That system however doesn't have any IVMD blocks.
>>
>> Mine are a little different. The Dinar (Fam15) has an IVHD entry just
>> for the range 0-2 (host bridge, <nothing>, IOMMU). The Rome (Fam17)
>> has an IVHD entry just for 0 (host bridge), but not for the IOMMU. I
>> think it is entirely reasonable for host bridge(s) and IOMMU(s) to not
>> be covered by any IVHD. They aren't devices that would require
>> servicing by an IOMMU.
>>
>> Looking at the code I think we want to do things a little differently
>> though: Pull find_iommu_for_device() out of register_range_for_device()
>> and have parse_ivmd_device_range() do the skipping when there's no
>> IOMMU for a device.
> 
> What about parse_ivmd_device_select()?  The IOMMU check would also need
> to be duplicated there, which is not ideal IMO.

That's not ideal, but a reasonably small price to pay.

>> Plus error when no device in the range is covered
>> by an IOMMU, or if any two devices are covered by different IOMMUs.
> 
> I'm not sure I understand you last comment: do you mean to return an
> error if a IVMD block range covers devices assigned to different
> IOMMUs?  If that's the case, I'm afraid I don't agree, I don't see
> anywhere in the spec that notes a IVMD block range can apply to
> devices assigned to different IOMMUs.

Hmm, right, I take back that part.

> I also think returning an error when no device in the IVMD range is
> covered by an IOMMU is dubious.  Xen will already print warning
> messages about such firmware inconsistencies, but refusing to boot is
> too strict.

I disagree. We shouldn't enable DMA remapping in such an event. Whereas
the "refusing to boot" is interrupt remapping related iirc, if x2APIC
is already enabled. We need to properly separate the two (and the
discussion there was started quite a long time ago, but it got stuck at
some point); until such time it is simply an undesirable side effect of
the inappropriate implementation that in certain case we fail boot when
we shouldn't.

Jan
Roger Pau Monné Oct. 9, 2024, 11:47 a.m. UTC | #6
On Wed, Oct 09, 2024 at 01:28:19PM +0200, Jan Beulich wrote:
> On 09.10.2024 13:13, Roger Pau Monné wrote:
> > On Wed, Oct 09, 2024 at 12:52:29PM +0200, Jan Beulich wrote:
> >> On 09.10.2024 10:03, Roger Pau Monné wrote:
> >>> On Tue, Oct 08, 2024 at 04:01:28PM +0200, Jan Beulich wrote:
> >>>> On 08.10.2024 12:47, Roger Pau Monne wrote:
> >>>>> IVMD table contains restrictions about memory which must be mandatory assigned
> >>>>> to devices (and which permissions it should use), or memory that should be
> >>>>> never accessible to devices.
> >>>>>
> >>>>> Some hardware however contains ranges in IVMD that reference devices outside of
> >>>>> the IVHD tables (in other words, devices not behind any IOMMU).  Such mismatch
> >>>>> will cause Xen to fail in register_range_for_device(), ultimately leading to
> >>>>> the IOMMU being disabled, and Xen crashing as x2APIC support might be already
> >>>>> enabled and relying on the IOMMU functionality.
> >>>>
> >>>> I find it hard to believe that on x86 systems with IOMMUs some devices would
> >>>> be left uncovered by any IOMMU. Is it possible that IVHD is flawed there? In
> >>>> which case we might rightfully refuse to boot? (Can you share e.g. that
> >>>> "iommu=debug" output that results from parsing the tables on that system?)
> >>>
> >>> I'm afraid I don't have any of such systems to test myself, however I
> >>> have the contents of IVRS:
> >>>
> >>>   ACPI Table Header
> >>> ------------------------------------------------------------------
> >>> Signature          : IVRS
> >>> Length             : 0x000001F8
> >>> Revision           : 0x02
> >>> Checksum           : 0x06
> >>> OEM ID             : AMD  
> >>> OEM Table ID       : AmdTable
> >>> OEM Revision       : 0x00000001
> >>> Creator ID         : AMD 
> >>> Creator Revision   : 0x00000001
> >>> IVinfo             : 0x00203043
> >>> 	  IVHD
> >>> 	----------------------------------------------------------------
> >>> 	Type                  : 0x10
> >>> 	Flags                 : 0xB0
> >>> 	Length                : 0x0044
> >>> 	IOMMU Device ID       : 0x0002
> >>> 	Capability Offset     : 0x0040
> >>> 	IOMMU Base Address    : 0x00000000FD200000
> >>> 	Segment Group         : 0x0000
> >>> 	IOMMU Info            : 0x0000
> >>> 	IOMMU Feature Info    : 0x80048F6E
> >>> 		  Range
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x03
> >>> 		Start of Range        : 0x0003
> >>> 		End of Range          : 0xFFFE
> >>> 		DTE Setting           : 0x00
> >>> 		  Alias Range
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x43
> >>> 		Start of Range        : 0xFF00
> >>> 		End of Range          : 0xFFFF
> >>> 		DTE Setting           : 0x00
> >>> 		Source Device ID      : 0x00A5
> >>> 		  Special Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x48
> >>> 		Device ID             : 0x0000
> >>> 		DTE Setting           : 0x00
> >>> 		Source Device ID      : 0x00A0
> >>> 		Handle                : 0x00
> >>> 		Variety               : HPET
> >>> 		  Special Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x48
> >>> 		Device ID             : 0x0000
> >>> 		DTE Setting           : 0xD7
> >>> 		Source Device ID      : 0x00A0
> >>> 		Handle                : 0x21
> >>> 		Variety               : IOAPIC
> >>> 		  Special Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x48
> >>> 		Device ID             : 0x0000
> >>> 		DTE Setting           : 0x00
> >>> 		Source Device ID      : 0x0001
> >>> 		Handle                : 0x22
> >>> 		Variety               : IOAPIC
> >>> 	  IVHD
> >>> 	----------------------------------------------------------------
> >>> 	Type                  : 0x11
> >>> 	Flags                 : 0x30
> >>> 	Length                : 0x0054
> >>> 	IOMMU Device ID       : 0x0002
> >>> 	Capability Offset     : 0x0040
> >>> 	IOMMU Base Address    : 0x00000000FD200000
> >>> 	Segment Group         : 0x0000
> >>> 	IOMMU Info            : 0x0000
> >>> 	IOMMU Feature Info    : 0x00048000
> >>> 		  Range
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x03
> >>> 		Start of Range        : 0x0003
> >>> 		End of Range          : 0xFFFE
> >>> 		DTE Setting           : 0x00
> >>> 		  Alias Range
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x43
> >>> 		Start of Range        : 0xFF00
> >>> 		End of Range          : 0xFFFF
> >>> 		DTE Setting           : 0x00
> >>> 		Source Device ID      : 0x00A5
> >>> 		  Special Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x48
> >>> 		Device ID             : 0x0000
> >>> 		DTE Setting           : 0x00
> >>> 		Source Device ID      : 0x00A0
> >>> 		Handle                : 0x00
> >>> 		Variety               : HPET
> >>> 		  Special Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x48
> >>> 		Device ID             : 0x0000
> >>> 		DTE Setting           : 0xD7
> >>> 		Source Device ID      : 0x00A0
> >>> 		Handle                : 0x21
> >>> 		Variety               : IOAPIC
> >>> 		  Special Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x48
> >>> 		Device ID             : 0x0000
> >>> 		DTE Setting           : 0x00
> >>> 		Source Device ID      : 0x0001
> >>> 		Handle                : 0x22
> >>> 		Variety               : IOAPIC
> >>> 	  IVMD
> >>> 	----------------------------------------------------------------
> >>> 	Type                                 : 0x22
> >>> 	Flags                                : 0x08
> >>> 	Length                               : 0x0020
> >>> 	DeviceID                             : 0x0000
> >>> 	AuxiliaryData                        : 0x0FFF
> >>> 	Reserved                             : 0x0000000000000000
> >>> 	IVMD Start Address                   : 0x0000000096191000
> >>> 	IVMD Memory Block Length             : 0x0000000000000022
> >>> 	  IVMD
> >>> 	----------------------------------------------------------------
> >>> 	Type                                 : 0x22
> >>> 	Flags                                : 0x08
> >>> 	Length                               : 0x0020
> >>> 	DeviceID                             : 0x0000
> >>> 	AuxiliaryData                        : 0x0FFF
> >>> 	Reserved                             : 0x0000000000000000
> >>> 	IVMD Start Address                   : 0x0000000097D9E000
> >>> 	IVMD Memory Block Length             : 0x0000000000000022
> >>> 	  IVMD
> >>> 	----------------------------------------------------------------
> >>> 	Type                                 : 0x22
> >>> 	Flags                                : 0x08
> >>> 	Length                               : 0x0020
> >>> 	DeviceID                             : 0x0000
> >>> 	AuxiliaryData                        : 0x0FFF
> >>> 	Reserved                             : 0x0000000000000000
> >>> 	IVMD Start Address                   : 0x0000000097D9D000
> >>> 	IVMD Memory Block Length             : 0x0000000000000022
> >>> 	  IVHD
> >>> 	----------------------------------------------------------------
> >>> 	Type                  : 0x40
> >>> 	Flags                 : 0x30
> >>> 	Length                : 0x00D0
> >>> 	IOMMU Device ID       : 0x0002
> >>> 	Capability Offset     : 0x0040
> >>> 	IOMMU Base Address    : 0x00000000FD200000
> >>> 	Segment Group         : 0x0000
> >>> 	IOMMU Info            : 0x0000
> >>> 	IOMMU Feature Info    : 0x00048000
> >>> 		  Range
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x03
> >>> 		Start of Range        : 0x0003
> >>> 		End of Range          : 0xFFFE
> >>> 		DTE Setting           : 0x00
> >>> 		  Alias Range
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x43
> >>> 		Start of Range        : 0xFF00
> >>> 		End of Range          : 0xFFFF
> >>> 		DTE Setting           : 0x00
> >>> 		Source Device ID      : 0x00A5
> >>> 		  Special Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x48
> >>> 		Device ID             : 0x0000
> >>> 		DTE Setting           : 0x00
> >>> 		Source Device ID      : 0x00A0
> >>> 		Handle                : 0x00
> >>> 		Variety               : HPET
> >>> 		  Special Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x48
> >>> 		Device ID             : 0x0000
> >>> 		DTE Setting           : 0xD7
> >>> 		Source Device ID      : 0x00A0
> >>> 		Handle                : 0x21
> >>> 		Variety               : IOAPIC
> >>> 		  Special Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0x48
> >>> 		Device ID             : 0x0000
> >>> 		DTE Setting           : 0x00
> >>> 		Source Device ID      : 0x0001
> >>> 		Handle                : 0x22
> >>> 		Variety               : IOAPIC
> >>> 		  Variable Length ACPI HID Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0xF0
> >>> 		Device ID             : 0x00A5
> >>> 		DTE Setting           : 0x40
> >>> 		Hardware ID           : AMDI0020
> >>> 		Extended DTE Setting  : 
> >>> 		Unique ID Format      : 2
> >>> 		Unique ID Length      : 9
> >>> 		Unique ID             : \_SB.FUR0
> >>> 		  Variable Length ACPI HID Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0xF0
> >>> 		Device ID             : 0x00A5
> >>> 		DTE Setting           : 0x40
> >>> 		Hardware ID           : AMDI0020
> >>> 		Extended DTE Setting  : 
> >>> 		Unique ID Format      : 2
> >>> 		Unique ID Length      : 9
> >>> 		Unique ID             : \_SB.FUR1
> >>> 		  Variable Length ACPI HID Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0xF0
> >>> 		Device ID             : 0x00A5
> >>> 		DTE Setting           : 0x40
> >>> 		Hardware ID           : AMDI0020
> >>> 		Extended DTE Setting  : 
> >>> 		Unique ID Format      : 2
> >>> 		Unique ID Length      : 9
> >>> 		Unique ID             : \_SB.FUR2
> >>> 		  Variable Length ACPI HID Device
> >>> 		--------------------------------------------------
> >>> 		Type                  : 0xF0
> >>> 		Device ID             : 0x00A5
> >>> 		DTE Setting           : 0x40
> >>> 		Hardware ID           : AMDI0020
> >>> 		Extended DTE Setting  : 
> >>> 		Unique ID Format      : 2
> >>> 		Unique ID Length      : 9
> >>> 		Unique ID             : \_SB.FUR3
> >>>
> >>> FWIW, I've checked on one of the AMD server systems we have on the
> >>> lab, and the IVHD entries are fairly similar to the ones here, as
> >>> neither the PCI Host Bridge, nor the IOMMU are covered by any IVHD
> >>> block.  That system however doesn't have any IVMD blocks.
> >>
> >> Mine are a little different. The Dinar (Fam15) has an IVHD entry just
> >> for the range 0-2 (host bridge, <nothing>, IOMMU). The Rome (Fam17)
> >> has an IVHD entry just for 0 (host bridge), but not for the IOMMU. I
> >> think it is entirely reasonable for host bridge(s) and IOMMU(s) to not
> >> be covered by any IVHD. They aren't devices that would require
> >> servicing by an IOMMU.
> >>
> >> Looking at the code I think we want to do things a little differently
> >> though: Pull find_iommu_for_device() out of register_range_for_device()
> >> and have parse_ivmd_device_range() do the skipping when there's no
> >> IOMMU for a device.
> > 
> > What about parse_ivmd_device_select()?  The IOMMU check would also need
> > to be duplicated there, which is not ideal IMO.
> 
> That's not ideal, but a reasonably small price to pay.

Pulling the check out is only helpful if we plan to return an error if
a IVMD block has all references to devices not assigned to an IOMMU,
which I'm not convinced we should do.

> >> Plus error when no device in the range is covered
> >> by an IOMMU, or if any two devices are covered by different IOMMUs.
> > 
> > I'm not sure I understand you last comment: do you mean to return an
> > error if a IVMD block range covers devices assigned to different
> > IOMMUs?  If that's the case, I'm afraid I don't agree, I don't see
> > anywhere in the spec that notes a IVMD block range can apply to
> > devices assigned to different IOMMUs.
> 
> Hmm, right, I take back that part.
> 
> > I also think returning an error when no device in the IVMD range is
> > covered by an IOMMU is dubious.  Xen will already print warning
> > messages about such firmware inconsistencies, but refusing to boot is
> > too strict.
> 
> I disagree. We shouldn't enable DMA remapping in such an event. Whereas

I'm not sure I understand why you would go as far as refusing to
enable DMA remapping.  How is a IVMD block having references to some
devices not assigned to any IOMMU different to all devices referenced
not assigned to any IOMMU?  We should deal with both in the same
way.

If all devices in the IVMD block are not covered by an IOMMU, the
IVMD block is useless.  But there's nothing for Xen to action, due to
the devices not having an IOMMU assigned.  IOW: it would be the same
as booting natively without parsing the IVRS in the first place.

> the "refusing to boot" is interrupt remapping related iirc, if x2APIC
> is already enabled. We need to properly separate the two (and the
> discussion there was started quite a long time ago, but it got stuck at
> some point); until such time it is simply an undesirable side effect of
> the inappropriate implementation that in certain case we fail boot when
> we shouldn't.

Yes, but that's a different topic, and not something I plan to fix as
the scope of this patch :).

Thanks, Roger.
Jan Beulich Oct. 9, 2024, 12:09 p.m. UTC | #7
On 09.10.2024 13:47, Roger Pau Monné wrote:
> On Wed, Oct 09, 2024 at 01:28:19PM +0200, Jan Beulich wrote:
>> On 09.10.2024 13:13, Roger Pau Monné wrote:
>>> I also think returning an error when no device in the IVMD range is
>>> covered by an IOMMU is dubious.  Xen will already print warning
>>> messages about such firmware inconsistencies, but refusing to boot is
>>> too strict.
>>
>> I disagree. We shouldn't enable DMA remapping in such an event. Whereas
> 
> I'm not sure I understand why you would go as far as refusing to
> enable DMA remapping.  How is a IVMD block having references to some
> devices not assigned to any IOMMU different to all devices referenced
> not assigned to any IOMMU?  We should deal with both in the same
> way.

Precisely because of ...

> If all devices in the IVMD block are not covered by an IOMMU, the
> IVMD block is useless.

... this. We simply can't judge whether such a useless block really was
meant to cover something. If it was, we're hosed. Or maybe we screwed up
and wrongly conclude it's useless.

>  But there's nothing for Xen to action, due to
> the devices not having an IOMMU assigned.  IOW: it would be the same
> as booting natively without parsing the IVRS in the first place.

Not really, no. Not parsing IVRS means not turning on any IOMMU. We
then know we can't pass through any devices. We can't assess the
security of passing through devices (as far as it's under our control)
if we enable the IOMMU in perhaps a flawed way.

A formally valid IVMD we can't make sense of is imo no different from
a formally invalid IVMD, for which we would return ENODEV as well (and
hence fail to enable the IOMMU). Whereas what you're suggesting is, if
I take it further, to basically ignore (almost) all errors in table
parsing, and enable the IOMMU(s) in a best effort manner, no matter
whether that leads to a functional (let alone secure [to the degree
possible]) system.

What I don't really understand is why you want to relax our checking
beyond what's necessary for the one issue at hand.

>> the "refusing to boot" is interrupt remapping related iirc, if x2APIC
>> is already enabled. We need to properly separate the two (and the
>> discussion there was started quite a long time ago, but it got stuck at
>> some point); until such time it is simply an undesirable side effect of
>> the inappropriate implementation that in certain case we fail boot when
>> we shouldn't.
> 
> Yes, but that's a different topic, and not something I plan to fix as
> the scope of this patch :).

Sure, I'm merely asking to accept that, until that's resolved, issues
with boot failure can result here, and need to be lived with.

Jan
Teddy Astie Oct. 9, 2024, 12:28 p.m. UTC | #8
Hello,

Le 09/10/2024 à 14:09, Jan Beulich a écrit :
> On 09.10.2024 13:47, Roger Pau Monné wrote:
>> On Wed, Oct 09, 2024 at 01:28:19PM +0200, Jan Beulich wrote:
>>> On 09.10.2024 13:13, Roger Pau Monné wrote:
>>>> I also think returning an error when no device in the IVMD range is
>>>> covered by an IOMMU is dubious.  Xen will already print warning
>>>> messages about such firmware inconsistencies, but refusing to boot is
>>>> too strict.
>>>
>>> I disagree. We shouldn't enable DMA remapping in such an event. Whereas
>>
>> I'm not sure I understand why you would go as far as refusing to
>> enable DMA remapping.  How is a IVMD block having references to some
>> devices not assigned to any IOMMU different to all devices referenced
>> not assigned to any IOMMU?  We should deal with both in the same
>> way.
> 
> Precisely because of ...
> 
>> If all devices in the IVMD block are not covered by an IOMMU, the
>> IVMD block is useless.
> 
> ... this. We simply can't judge whether such a useless block really was
> meant to cover something. If it was, we're hosed. Or maybe we screwed up
> and wrongly conclude it's useless.
> 
>>   But there's nothing for Xen to action, due to
>> the devices not having an IOMMU assigned.  IOW: it would be the same
>> as booting natively without parsing the IVRS in the first place.
> 
> Not really, no. Not parsing IVRS means not turning on any IOMMU. We
> then know we can't pass through any devices. We can't assess the
> security of passing through devices (as far as it's under our control)
> if we enable the IOMMU in perhaps a flawed way.
> 
> A formally valid IVMD we can't make sense of is imo no different from
> a formally invalid IVMD, for which we would return ENODEV as well (and
> hence fail to enable the IOMMU). Whereas what you're suggesting is, if
> I take it further, to basically ignore (almost) all errors in table
> parsing, and enable the IOMMU(s) in a best effort manner, no matter
> whether that leads to a functional (let alone secure [to the degree
> possible]) system.
> 
> What I don't really understand is why you want to relax our checking
> beyond what's necessary for the one issue at hand.
> 
>>> the "refusing to boot" is interrupt remapping related iirc, if x2APIC
>>> is already enabled. We need to properly separate the two (and the
>>> discussion there was started quite a long time ago, but it got stuck at
>>> some point); until such time it is simply an undesirable side effect of
>>> the inappropriate implementation that in certain case we fail boot when
>>> we shouldn't.
>>
>> Yes, but that's a different topic, and not something I plan to fix as
>> the scope of this patch :).
> 
> Sure, I'm merely asking to accept that, until that's resolved, issues
> with boot failure can result here, and need to be lived with.
> 
> Jan
> 

Would it be possible to find a middle-ground by adding a "non-security 
supported" xen command-line option to allow a workaround on this issue ?

Something like iommu=amd-skip-unknown-ivmd ?

And preventing boot otherwise.

Teddy


Teddy Astie | Vates XCP-ng Intern

XCP-ng & Xen Orchestra - Vates solutions

web: https://vates.tech
Jan Beulich Oct. 9, 2024, 12:46 p.m. UTC | #9
On 09.10.2024 14:28, Teddy Astie wrote:
> Hello,
> 
> Le 09/10/2024 à 14:09, Jan Beulich a écrit :
>> On 09.10.2024 13:47, Roger Pau Monné wrote:
>>> On Wed, Oct 09, 2024 at 01:28:19PM +0200, Jan Beulich wrote:
>>>> On 09.10.2024 13:13, Roger Pau Monné wrote:
>>>>> I also think returning an error when no device in the IVMD range is
>>>>> covered by an IOMMU is dubious.  Xen will already print warning
>>>>> messages about such firmware inconsistencies, but refusing to boot is
>>>>> too strict.
>>>>
>>>> I disagree. We shouldn't enable DMA remapping in such an event. Whereas
>>>
>>> I'm not sure I understand why you would go as far as refusing to
>>> enable DMA remapping.  How is a IVMD block having references to some
>>> devices not assigned to any IOMMU different to all devices referenced
>>> not assigned to any IOMMU?  We should deal with both in the same
>>> way.
>>
>> Precisely because of ...
>>
>>> If all devices in the IVMD block are not covered by an IOMMU, the
>>> IVMD block is useless.
>>
>> ... this. We simply can't judge whether such a useless block really was
>> meant to cover something. If it was, we're hosed. Or maybe we screwed up
>> and wrongly conclude it's useless.
>>
>>>   But there's nothing for Xen to action, due to
>>> the devices not having an IOMMU assigned.  IOW: it would be the same
>>> as booting natively without parsing the IVRS in the first place.
>>
>> Not really, no. Not parsing IVRS means not turning on any IOMMU. We
>> then know we can't pass through any devices. We can't assess the
>> security of passing through devices (as far as it's under our control)
>> if we enable the IOMMU in perhaps a flawed way.
>>
>> A formally valid IVMD we can't make sense of is imo no different from
>> a formally invalid IVMD, for which we would return ENODEV as well (and
>> hence fail to enable the IOMMU). Whereas what you're suggesting is, if
>> I take it further, to basically ignore (almost) all errors in table
>> parsing, and enable the IOMMU(s) in a best effort manner, no matter
>> whether that leads to a functional (let alone secure [to the degree
>> possible]) system.
>>
>> What I don't really understand is why you want to relax our checking
>> beyond what's necessary for the one issue at hand.
>>
>>>> the "refusing to boot" is interrupt remapping related iirc, if x2APIC
>>>> is already enabled. We need to properly separate the two (and the
>>>> discussion there was started quite a long time ago, but it got stuck at
>>>> some point); until such time it is simply an undesirable side effect of
>>>> the inappropriate implementation that in certain case we fail boot when
>>>> we shouldn't.
>>>
>>> Yes, but that's a different topic, and not something I plan to fix as
>>> the scope of this patch :).
>>
>> Sure, I'm merely asking to accept that, until that's resolved, issues
>> with boot failure can result here, and need to be lived with.
> 
> Would it be possible to find a middle-ground by adding a "non-security 
> supported" xen command-line option to allow a workaround on this issue ?
> 
> Something like iommu=amd-skip-unknown-ivmd ?

Do we need to go as far? Isn't "iommu=off" enough of a (boot) workaround?

Jan
Roger Pau Monné Oct. 9, 2024, 1:37 p.m. UTC | #10
On Wed, Oct 09, 2024 at 02:09:33PM +0200, Jan Beulich wrote:
> On 09.10.2024 13:47, Roger Pau Monné wrote:
> > On Wed, Oct 09, 2024 at 01:28:19PM +0200, Jan Beulich wrote:
> >> On 09.10.2024 13:13, Roger Pau Monné wrote:
> >>> I also think returning an error when no device in the IVMD range is
> >>> covered by an IOMMU is dubious.  Xen will already print warning
> >>> messages about such firmware inconsistencies, but refusing to boot is
> >>> too strict.
> >>
> >> I disagree. We shouldn't enable DMA remapping in such an event. Whereas
> > 
> > I'm not sure I understand why you would go as far as refusing to
> > enable DMA remapping.  How is a IVMD block having references to some
> > devices not assigned to any IOMMU different to all devices referenced
> > not assigned to any IOMMU?  We should deal with both in the same
> > way.
> 
> Precisely because of ...
> 
> > If all devices in the IVMD block are not covered by an IOMMU, the
> > IVMD block is useless.
> 
> ... this. We simply can't judge whether such a useless block really was
> meant to cover something. If it was, we're hosed. Or maybe we screwed up
> and wrongly conclude it's useless.

The same could be stated about devices in a IVMD block that are not
assigned to an IOMMU: it could also be Xen that screwed up and wrongly
concluded they are not assigned to an IOMMU.

> >  But there's nothing for Xen to action, due to
> > the devices not having an IOMMU assigned.  IOW: it would be the same
> > as booting natively without parsing the IVRS in the first place.
> 
> Not really, no. Not parsing IVRS means not turning on any IOMMU. We
> then know we can't pass through any devices. We can't assess the
> security of passing through devices (as far as it's under our control)
> if we enable the IOMMU in perhaps a flawed way.
> 
> A formally valid IVMD we can't make sense of is imo no different from
> a formally invalid IVMD, for which we would return ENODEV as well (and
> hence fail to enable the IOMMU). Whereas what you're suggesting is, if
> I take it further, to basically ignore (almost) all errors in table
> parsing, and enable the IOMMU(s) in a best effort manner, no matter
> whether that leads to a functional (let alone secure [to the degree
> possible]) system.

No, don't take it further: not ignore all errors, I think we should
ignore errors when the device in the IVMD is not behind an IOMMU.  And
I think that should apply globally, regardless of whether all devices
in the IVMD block fall in that category.

That will bring AMD-Vi inline with VT-d RMRR, as from what I can see
acpi_parse_one_rmrr() doesn't care whether the device scope in the
entry is behind an IOMMU or not, or whether the whole RMRR doesn't
effectively apply to any device because none of them is covered by an
IOMMU.

> What I don't really understand is why you want to relax our checking
> beyond what's necessary for the one issue at hand.

This issue has been reported to us and we have been able to debug.
However, I worry what other malformed IVMD blocks might be out there,
for example an IVMD block that applies to a single device (type 21h),
but such single device doesn't exist (or it's not behind an IOMMU).
Maybe next time we simply won't get a report, the user will try Xen,
see it's not working and move to something else.

I've taken a quick look at Linux, and when parsing the IVMD blocks
there's no checking that referred devices are behind an IOMMU, the
regions are just added to a list.

Thanks, Roger.
Marek Marczykowski-Górecki Oct. 10, 2024, 12:45 p.m. UTC | #11
On Wed, Oct 09, 2024 at 03:37:06PM +0200, Roger Pau Monné wrote:
> On Wed, Oct 09, 2024 at 02:09:33PM +0200, Jan Beulich wrote:
> > On 09.10.2024 13:47, Roger Pau Monné wrote:
> > > On Wed, Oct 09, 2024 at 01:28:19PM +0200, Jan Beulich wrote:
> > >> On 09.10.2024 13:13, Roger Pau Monné wrote:
> > >>> I also think returning an error when no device in the IVMD range is
> > >>> covered by an IOMMU is dubious.  Xen will already print warning
> > >>> messages about such firmware inconsistencies, but refusing to boot is
> > >>> too strict.
> > >>
> > >> I disagree. We shouldn't enable DMA remapping in such an event. Whereas
> > > 
> > > I'm not sure I understand why you would go as far as refusing to
> > > enable DMA remapping.  How is a IVMD block having references to some
> > > devices not assigned to any IOMMU different to all devices referenced
> > > not assigned to any IOMMU?  We should deal with both in the same
> > > way.
> > 
> > Precisely because of ...
> > 
> > > If all devices in the IVMD block are not covered by an IOMMU, the
> > > IVMD block is useless.
> > 
> > ... this. We simply can't judge whether such a useless block really was
> > meant to cover something. If it was, we're hosed. Or maybe we screwed up
> > and wrongly conclude it's useless.
> 
> The same could be stated about devices in a IVMD block that are not
> assigned to an IOMMU: it could also be Xen that screwed up and wrongly
> concluded they are not assigned to an IOMMU.
> 
> > >  But there's nothing for Xen to action, due to
> > > the devices not having an IOMMU assigned.  IOW: it would be the same
> > > as booting natively without parsing the IVRS in the first place.
> > 
> > Not really, no. Not parsing IVRS means not turning on any IOMMU. We
> > then know we can't pass through any devices. We can't assess the
> > security of passing through devices (as far as it's under our control)
> > if we enable the IOMMU in perhaps a flawed way.
> > 
> > A formally valid IVMD we can't make sense of is imo no different from
> > a formally invalid IVMD, for which we would return ENODEV as well (and
> > hence fail to enable the IOMMU). Whereas what you're suggesting is, if
> > I take it further, to basically ignore (almost) all errors in table
> > parsing, and enable the IOMMU(s) in a best effort manner, no matter
> > whether that leads to a functional (let alone secure [to the degree
> > possible]) system.
> 
> No, don't take it further: not ignore all errors, I think we should
> ignore errors when the device in the IVMD is not behind an IOMMU.  And
> I think that should apply globally, regardless of whether all devices
> in the IVMD block fall in that category.
> 
> That will bring AMD-Vi inline with VT-d RMRR, as from what I can see
> acpi_parse_one_rmrr() doesn't care whether the device scope in the
> entry is behind an IOMMU or not, or whether the whole RMRR doesn't
> effectively apply to any device because none of them is covered by an
> IOMMU.
> 
> > What I don't really understand is why you want to relax our checking
> > beyond what's necessary for the one issue at hand.
> 
> This issue has been reported to us and we have been able to debug.
> However, I worry what other malformed IVMD blocks might be out there,
> for example an IVMD block that applies to a single device (type 21h),
> but such single device doesn't exist (or it's not behind an IOMMU).
> Maybe next time we simply won't get a report, the user will try Xen,
> see it's not working and move to something else.
> 
> I've taken a quick look at Linux, and when parsing the IVMD blocks
> there's no checking that referred devices are behind an IOMMU, the
> regions are just added to a list.

It seems Jan's concern is about passthrough of a device that Xen
incorrectly ignored IVMD entry for. But that doesn't really happen - if
the device doesn't exist (at least according to Xen) or isn't behind an
IOMMU (at least according to Xen), it surely won't be used with
passthorugh, no? So, it should be safe to not fail on either of those
cases, as long as given IVMD is applied to other devices (that are
eligible for passthrough) in its range.

Just my 2c.
Jan Beulich Oct. 15, 2024, 6:38 a.m. UTC | #12
On 09.10.2024 15:37, Roger Pau Monné wrote:
> On Wed, Oct 09, 2024 at 02:09:33PM +0200, Jan Beulich wrote:
>> On 09.10.2024 13:47, Roger Pau Monné wrote:
>>> On Wed, Oct 09, 2024 at 01:28:19PM +0200, Jan Beulich wrote:
>>>> On 09.10.2024 13:13, Roger Pau Monné wrote:
>>>>> I also think returning an error when no device in the IVMD range is
>>>>> covered by an IOMMU is dubious.  Xen will already print warning
>>>>> messages about such firmware inconsistencies, but refusing to boot is
>>>>> too strict.
>>>>
>>>> I disagree. We shouldn't enable DMA remapping in such an event. Whereas
>>>
>>> I'm not sure I understand why you would go as far as refusing to
>>> enable DMA remapping.  How is a IVMD block having references to some
>>> devices not assigned to any IOMMU different to all devices referenced
>>> not assigned to any IOMMU?  We should deal with both in the same
>>> way.
>>
>> Precisely because of ...
>>
>>> If all devices in the IVMD block are not covered by an IOMMU, the
>>> IVMD block is useless.
>>
>> ... this. We simply can't judge whether such a useless block really was
>> meant to cover something. If it was, we're hosed. Or maybe we screwed up
>> and wrongly conclude it's useless.
> 
> The same could be stated about devices in a IVMD block that are not
> assigned to an IOMMU: it could also be Xen that screwed up and wrongly
> concluded they are not assigned to an IOMMU.
> 
>>>  But there's nothing for Xen to action, due to
>>> the devices not having an IOMMU assigned.  IOW: it would be the same
>>> as booting natively without parsing the IVRS in the first place.
>>
>> Not really, no. Not parsing IVRS means not turning on any IOMMU. We
>> then know we can't pass through any devices. We can't assess the
>> security of passing through devices (as far as it's under our control)
>> if we enable the IOMMU in perhaps a flawed way.
>>
>> A formally valid IVMD we can't make sense of is imo no different from
>> a formally invalid IVMD, for which we would return ENODEV as well (and
>> hence fail to enable the IOMMU). Whereas what you're suggesting is, if
>> I take it further, to basically ignore (almost) all errors in table
>> parsing, and enable the IOMMU(s) in a best effort manner, no matter
>> whether that leads to a functional (let alone secure [to the degree
>> possible]) system.
> 
> No, don't take it further: not ignore all errors, I think we should
> ignore errors when the device in the IVMD is not behind an IOMMU.  And
> I think that should apply globally, regardless of whether all devices
> in the IVMD block fall in that category.
> 
> That will bring AMD-Vi inline with VT-d RMRR, as from what I can see
> acpi_parse_one_rmrr() doesn't care whether the device scope in the
> entry is behind an IOMMU or not, or whether the whole RMRR doesn't
> effectively apply to any device because none of them is covered by an
> IOMMU.
> 
>> What I don't really understand is why you want to relax our checking
>> beyond what's necessary for the one issue at hand.
> 
> This issue has been reported to us and we have been able to debug.
> However, I worry what other malformed IVMD blocks might be out there,
> for example an IVMD block that applies to a single device (type 21h),
> but such single device doesn't exist (or it's not behind an IOMMU).
> Maybe next time we simply won't get a report, the user will try Xen,
> see it's not working and move to something else.
> 
> I've taken a quick look at Linux, and when parsing the IVMD blocks
> there's no checking that referred devices are behind an IOMMU, the
> regions are just added to a list.

Hmm, okay, after some more chewing on it on this basis
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/amd/iommu_acpi.c b/xen/drivers/passthrough/amd/iommu_acpi.c
index 3f5508eba049..c416120326c9 100644
--- a/xen/drivers/passthrough/amd/iommu_acpi.c
+++ b/xen/drivers/passthrough/amd/iommu_acpi.c
@@ -248,8 +248,9 @@  static int __init register_range_for_device(
     iommu = find_iommu_for_device(seg, bdf);
     if ( !iommu )
     {
-        AMD_IOMMU_ERROR("IVMD: no IOMMU for Dev_Id %#x\n", bdf);
-        return -ENODEV;
+        AMD_IOMMU_WARN("IVMD: no IOMMU for device %pp - ignoring constrain\n",
+                       &PCI_SBDF(seg, bdf));
+        return 0;
     }
     req = ivrs_mappings[bdf].dte_requestor_id;