diff mbox series

VT-d/DMAR: accept DRHD with non-discoverable PCI devices

Message ID 1553113323-14664-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive)
State New, archived
Headers show
Series VT-d/DMAR: accept DRHD with non-discoverable PCI devices | expand

Commit Message

Igor Druzhinin March 20, 2019, 8:22 p.m. UTC
Since commit dcf41790 ("x86/mmcfg/drhd: Move acpi_mmcfg_init() call
before calling acpi_parse_dmar()") PCI segment 0 is now known early
which made the sanity check on DRHD definition structure to work.
This, in turn, caused a regression on some machines (in particular,
HP PowerEdge R740 with I/O AT DMA disabled) where IOMMU was explicitly
disabled due to some internal PCI devices being non-discoverable but
present in DMAR.

While this is indeed a BIOS mistake it seems to be not that critical
to disable the whole IOMMU. Instead, extend the scope of
"workaround_bios_bug" option and make it enabled by default. This is
consistent with our documentation and actually what a user might expect
from an option with that name. It also doesn't seem safe to simply ignore
DRHD without initialization so remove this case. But leave the original
DMAR check in place to still allow error reporting.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 docs/misc/xen-command-line.pandoc  |  2 +-
 xen/drivers/passthrough/iommu.c    |  2 +-
 xen/drivers/passthrough/vtd/dmar.c | 25 ++++++-------------------
 3 files changed, 8 insertions(+), 21 deletions(-)

Comments

Andrew Cooper March 21, 2019, 1:17 p.m. UTC | #1
On 20/03/2019 20:22, Igor Druzhinin wrote:
> Since commit dcf41790 ("x86/mmcfg/drhd: Move acpi_mmcfg_init() call
> before calling acpi_parse_dmar()") PCI segment 0 is now known early
> which made the sanity check on DRHD definition structure to work.
> This, in turn, caused a regression on some machines (in particular,
> HP PowerEdge R740 with I/O AT DMA disabled) where IOMMU was explicitly
> disabled due to some internal PCI devices being non-discoverable but
> present in DMAR.
>
> While this is indeed a BIOS mistake it seems to be not that critical
> to disable the whole IOMMU. Instead, extend the scope of
> "workaround_bios_bug" option and make it enabled by default. This is
> consistent with our documentation and actually what a user might expect
> from an option with that name. It also doesn't seem safe to simply ignore
> DRHD without initialization so remove this case. But leave the original
> DMAR check in place to still allow error reporting.
>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

This is code which was previously dead.  The behaviour of ignoring an
IOMMU because there is unreachable device behind it is awful and
shouldn't exist, but we should at least leave a trace of it in the logs.

TBH, I'd prefer to delete the `workaround_bios_bug` option entirely, and
just print out the bad entries.  Nothing the option does is worthy of
shutting the IOMMU down.

CC-ing Juergen as this is a 4.12 regression.

I can easily spin a patch along my suggested route if others agree.

~Andrew
Jürgen Groß March 21, 2019, 1:50 p.m. UTC | #2
On 21/03/2019 14:17, Andrew Cooper wrote:
> On 20/03/2019 20:22, Igor Druzhinin wrote:
>> Since commit dcf41790 ("x86/mmcfg/drhd: Move acpi_mmcfg_init() call
>> before calling acpi_parse_dmar()") PCI segment 0 is now known early
>> which made the sanity check on DRHD definition structure to work.
>> This, in turn, caused a regression on some machines (in particular,
>> HP PowerEdge R740 with I/O AT DMA disabled) where IOMMU was explicitly
>> disabled due to some internal PCI devices being non-discoverable but
>> present in DMAR.
>>
>> While this is indeed a BIOS mistake it seems to be not that critical
>> to disable the whole IOMMU. Instead, extend the scope of
>> "workaround_bios_bug" option and make it enabled by default. This is
>> consistent with our documentation and actually what a user might expect
>> from an option with that name. It also doesn't seem safe to simply ignore
>> DRHD without initialization so remove this case. But leave the original
>> DMAR check in place to still allow error reporting.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> 
> This is code which was previously dead.  The behaviour of ignoring an
> IOMMU because there is unreachable device behind it is awful and
> shouldn't exist, but we should at least leave a trace of it in the logs.
> 
> TBH, I'd prefer to delete the `workaround_bios_bug` option entirely, and
> just print out the bad entries.  Nothing the option does is worthy of
> shutting the IOMMU down.
> 
> CC-ing Juergen as this is a 4.12 regression.
> 
> I can easily spin a patch along my suggested route if others agree.

Jan has some days off, AFAIK he is going to be back next Monday.

I'd appreciate a rather fast agreement for 4.12.


Juergen
Pasi Kärkkäinen March 21, 2019, 4:55 p.m. UTC | #3
On Wed, Mar 20, 2019 at 08:22:03PM +0000, Igor Druzhinin wrote:
> Since commit dcf41790 ("x86/mmcfg/drhd: Move acpi_mmcfg_init() call
> before calling acpi_parse_dmar()") PCI segment 0 is now known early
> which made the sanity check on DRHD definition structure to work.
> This, in turn, caused a regression on some machines (in particular,
> HP PowerEdge R740 with I/O AT DMA disabled) where IOMMU was explicitly
^^^^

That's probably a typo.. sounds like it should say "Dell PowerEdge R740", not HP.. 


-- Pasi

> disabled due to some internal PCI devices being non-discoverable but
> present in DMAR.
> 
> While this is indeed a BIOS mistake it seems to be not that critical
> to disable the whole IOMMU. Instead, extend the scope of
> "workaround_bios_bug" option and make it enabled by default. This is
> consistent with our documentation and actually what a user might expect
> from an option with that name. It also doesn't seem safe to simply ignore
> DRHD without initialization so remove this case. But leave the original
> DMAR check in place to still allow error reporting.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Igor Druzhinin March 21, 2019, 6:04 p.m. UTC | #4
On 21/03/2019 13:17, Andrew Cooper wrote:
> On 20/03/2019 20:22, Igor Druzhinin wrote:
>> Since commit dcf41790 ("x86/mmcfg/drhd: Move acpi_mmcfg_init() call
>> before calling acpi_parse_dmar()") PCI segment 0 is now known early
>> which made the sanity check on DRHD definition structure to work.
>> This, in turn, caused a regression on some machines (in particular,
>> HP PowerEdge R740 with I/O AT DMA disabled) where IOMMU was explicitly
>> disabled due to some internal PCI devices being non-discoverable but
>> present in DMAR.
>>
>> While this is indeed a BIOS mistake it seems to be not that critical
>> to disable the whole IOMMU. Instead, extend the scope of
>> "workaround_bios_bug" option and make it enabled by default. This is
>> consistent with our documentation and actually what a user might expect
>> from an option with that name. It also doesn't seem safe to simply ignore
>> DRHD without initialization so remove this case. But leave the original
>> DMAR check in place to still allow error reporting.
>>
>> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> 
> This is code which was previously dead.  The behaviour of ignoring an
> IOMMU because there is unreachable device behind it is awful and
> shouldn't exist, but we should at least leave a trace of it in the logs.
> 
> TBH, I'd prefer to delete the `workaround_bios_bug` option entirely, and
> just print out the bad entries.  Nothing the option does is worthy of
> shutting the IOMMU down.

The option left there to provide an easy fallback to the originally
intended behavior in case someone has a strong opinion that it might be
useful. Other than that, I'm ok with removing it.

Igor
diff mbox series

Patch

diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc
index b0b6300..9413354 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1261,7 +1261,7 @@  The following options are specific to Intel VT-d hardware:
     similar to Linux's `intel_iommu=igfx_off` option.  If specifying `no-igfx`
     fixes anything, please report the problem.
 
-*   The `workaround_bios_bug` boolean is disabled by default.  It can be used
+*   The `workaround_bios_bug` boolean is enabled by default.  It can be used
     to ignore errors when parsing the ACPI tables, and finding a listed PCI
     device which doesn't appear to exist in the system.
 
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index 5ecaa10..de3a9eb 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -30,7 +30,7 @@  bool_t __initdata iommu_enable = 1;
 bool_t __read_mostly iommu_enabled;
 bool_t __read_mostly force_iommu;
 bool_t __read_mostly iommu_verbose;
-bool_t __read_mostly iommu_workaround_bios_bug;
+bool_t __read_mostly iommu_workaround_bios_bug = 1;
 bool_t __read_mostly iommu_igfx = 1;
 bool_t __read_mostly iommu_snoop = 1;
 bool_t __read_mostly iommu_qinval = 1;
diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c
index ac10602..9526ee7 100644
--- a/xen/drivers/passthrough/vtd/dmar.c
+++ b/xen/drivers/passthrough/vtd/dmar.c
@@ -553,26 +553,13 @@  acpi_parse_one_drhd(struct acpi_dmar_header *header)
             }
         }
 
-        if ( invalid_cnt )
+        if ( invalid_cnt && !iommu_workaround_bios_bug )
         {
-            if ( iommu_workaround_bios_bug &&
-                 invalid_cnt == dmaru->scope.devices_cnt )
-            {
-                printk(XENLOG_WARNING VTDPREFIX
-                       "  Workaround BIOS bug: ignoring DRHD (no devices in its scope are PCI discoverable)\n");
-
-                scope_devices_free(&dmaru->scope);
-                iommu_free(dmaru);
-                xfree(dmaru);
-            }
-            else
-            {
-                printk(XENLOG_WARNING VTDPREFIX
-                       "  DRHD is invalid (some devices in its scope are not PCI discoverable)\n");
-                printk(XENLOG_WARNING VTDPREFIX
-                       "  Try \"iommu=force\" or \"iommu=workaround_bios_bug\" if you really want VT-d\n");
-                ret = -EINVAL;
-            }
+            printk(XENLOG_WARNING VTDPREFIX
+                   "  DRHD is invalid (some devices in its scope are not PCI discoverable)\n");
+            printk(XENLOG_WARNING VTDPREFIX
+                   "  Try \"iommu=force\" or \"iommu=workaround_bios_bug\" if you really want VT-d\n");
+            ret = -EINVAL;
         }
         else
             acpi_register_drhd_unit(dmaru);