Message ID | 1553113323-14664-1-git-send-email-igor.druzhinin@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | VT-d/DMAR: accept DRHD with non-discoverable PCI devices | expand |
On 20/03/2019 20:22, Igor Druzhinin wrote: > Since commit dcf41790 ("x86/mmcfg/drhd: Move acpi_mmcfg_init() call > before calling acpi_parse_dmar()") PCI segment 0 is now known early > which made the sanity check on DRHD definition structure to work. > This, in turn, caused a regression on some machines (in particular, > HP PowerEdge R740 with I/O AT DMA disabled) where IOMMU was explicitly > disabled due to some internal PCI devices being non-discoverable but > present in DMAR. > > While this is indeed a BIOS mistake it seems to be not that critical > to disable the whole IOMMU. Instead, extend the scope of > "workaround_bios_bug" option and make it enabled by default. This is > consistent with our documentation and actually what a user might expect > from an option with that name. It also doesn't seem safe to simply ignore > DRHD without initialization so remove this case. But leave the original > DMAR check in place to still allow error reporting. > > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> This is code which was previously dead. The behaviour of ignoring an IOMMU because there is unreachable device behind it is awful and shouldn't exist, but we should at least leave a trace of it in the logs. TBH, I'd prefer to delete the `workaround_bios_bug` option entirely, and just print out the bad entries. Nothing the option does is worthy of shutting the IOMMU down. CC-ing Juergen as this is a 4.12 regression. I can easily spin a patch along my suggested route if others agree. ~Andrew
On 21/03/2019 14:17, Andrew Cooper wrote: > On 20/03/2019 20:22, Igor Druzhinin wrote: >> Since commit dcf41790 ("x86/mmcfg/drhd: Move acpi_mmcfg_init() call >> before calling acpi_parse_dmar()") PCI segment 0 is now known early >> which made the sanity check on DRHD definition structure to work. >> This, in turn, caused a regression on some machines (in particular, >> HP PowerEdge R740 with I/O AT DMA disabled) where IOMMU was explicitly >> disabled due to some internal PCI devices being non-discoverable but >> present in DMAR. >> >> While this is indeed a BIOS mistake it seems to be not that critical >> to disable the whole IOMMU. Instead, extend the scope of >> "workaround_bios_bug" option and make it enabled by default. This is >> consistent with our documentation and actually what a user might expect >> from an option with that name. It also doesn't seem safe to simply ignore >> DRHD without initialization so remove this case. But leave the original >> DMAR check in place to still allow error reporting. >> >> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> > > This is code which was previously dead. The behaviour of ignoring an > IOMMU because there is unreachable device behind it is awful and > shouldn't exist, but we should at least leave a trace of it in the logs. > > TBH, I'd prefer to delete the `workaround_bios_bug` option entirely, and > just print out the bad entries. Nothing the option does is worthy of > shutting the IOMMU down. > > CC-ing Juergen as this is a 4.12 regression. > > I can easily spin a patch along my suggested route if others agree. Jan has some days off, AFAIK he is going to be back next Monday. I'd appreciate a rather fast agreement for 4.12. Juergen
On Wed, Mar 20, 2019 at 08:22:03PM +0000, Igor Druzhinin wrote: > Since commit dcf41790 ("x86/mmcfg/drhd: Move acpi_mmcfg_init() call > before calling acpi_parse_dmar()") PCI segment 0 is now known early > which made the sanity check on DRHD definition structure to work. > This, in turn, caused a regression on some machines (in particular, > HP PowerEdge R740 with I/O AT DMA disabled) where IOMMU was explicitly ^^^^ That's probably a typo.. sounds like it should say "Dell PowerEdge R740", not HP.. -- Pasi > disabled due to some internal PCI devices being non-discoverable but > present in DMAR. > > While this is indeed a BIOS mistake it seems to be not that critical > to disable the whole IOMMU. Instead, extend the scope of > "workaround_bios_bug" option and make it enabled by default. This is > consistent with our documentation and actually what a user might expect > from an option with that name. It also doesn't seem safe to simply ignore > DRHD without initialization so remove this case. But leave the original > DMAR check in place to still allow error reporting. > > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
On 21/03/2019 13:17, Andrew Cooper wrote: > On 20/03/2019 20:22, Igor Druzhinin wrote: >> Since commit dcf41790 ("x86/mmcfg/drhd: Move acpi_mmcfg_init() call >> before calling acpi_parse_dmar()") PCI segment 0 is now known early >> which made the sanity check on DRHD definition structure to work. >> This, in turn, caused a regression on some machines (in particular, >> HP PowerEdge R740 with I/O AT DMA disabled) where IOMMU was explicitly >> disabled due to some internal PCI devices being non-discoverable but >> present in DMAR. >> >> While this is indeed a BIOS mistake it seems to be not that critical >> to disable the whole IOMMU. Instead, extend the scope of >> "workaround_bios_bug" option and make it enabled by default. This is >> consistent with our documentation and actually what a user might expect >> from an option with that name. It also doesn't seem safe to simply ignore >> DRHD without initialization so remove this case. But leave the original >> DMAR check in place to still allow error reporting. >> >> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> > > This is code which was previously dead. The behaviour of ignoring an > IOMMU because there is unreachable device behind it is awful and > shouldn't exist, but we should at least leave a trace of it in the logs. > > TBH, I'd prefer to delete the `workaround_bios_bug` option entirely, and > just print out the bad entries. Nothing the option does is worthy of > shutting the IOMMU down. The option left there to provide an easy fallback to the originally intended behavior in case someone has a strong opinion that it might be useful. Other than that, I'm ok with removing it. Igor
diff --git a/docs/misc/xen-command-line.pandoc b/docs/misc/xen-command-line.pandoc index b0b6300..9413354 100644 --- a/docs/misc/xen-command-line.pandoc +++ b/docs/misc/xen-command-line.pandoc @@ -1261,7 +1261,7 @@ The following options are specific to Intel VT-d hardware: similar to Linux's `intel_iommu=igfx_off` option. If specifying `no-igfx` fixes anything, please report the problem. -* The `workaround_bios_bug` boolean is disabled by default. It can be used +* The `workaround_bios_bug` boolean is enabled by default. It can be used to ignore errors when parsing the ACPI tables, and finding a listed PCI device which doesn't appear to exist in the system. diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c index 5ecaa10..de3a9eb 100644 --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -30,7 +30,7 @@ bool_t __initdata iommu_enable = 1; bool_t __read_mostly iommu_enabled; bool_t __read_mostly force_iommu; bool_t __read_mostly iommu_verbose; -bool_t __read_mostly iommu_workaround_bios_bug; +bool_t __read_mostly iommu_workaround_bios_bug = 1; bool_t __read_mostly iommu_igfx = 1; bool_t __read_mostly iommu_snoop = 1; bool_t __read_mostly iommu_qinval = 1; diff --git a/xen/drivers/passthrough/vtd/dmar.c b/xen/drivers/passthrough/vtd/dmar.c index ac10602..9526ee7 100644 --- a/xen/drivers/passthrough/vtd/dmar.c +++ b/xen/drivers/passthrough/vtd/dmar.c @@ -553,26 +553,13 @@ acpi_parse_one_drhd(struct acpi_dmar_header *header) } } - if ( invalid_cnt ) + if ( invalid_cnt && !iommu_workaround_bios_bug ) { - if ( iommu_workaround_bios_bug && - invalid_cnt == dmaru->scope.devices_cnt ) - { - printk(XENLOG_WARNING VTDPREFIX - " Workaround BIOS bug: ignoring DRHD (no devices in its scope are PCI discoverable)\n"); - - scope_devices_free(&dmaru->scope); - iommu_free(dmaru); - xfree(dmaru); - } - else - { - printk(XENLOG_WARNING VTDPREFIX - " DRHD is invalid (some devices in its scope are not PCI discoverable)\n"); - printk(XENLOG_WARNING VTDPREFIX - " Try \"iommu=force\" or \"iommu=workaround_bios_bug\" if you really want VT-d\n"); - ret = -EINVAL; - } + printk(XENLOG_WARNING VTDPREFIX + " DRHD is invalid (some devices in its scope are not PCI discoverable)\n"); + printk(XENLOG_WARNING VTDPREFIX + " Try \"iommu=force\" or \"iommu=workaround_bios_bug\" if you really want VT-d\n"); + ret = -EINVAL; } else acpi_register_drhd_unit(dmaru);
Since commit dcf41790 ("x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling acpi_parse_dmar()") PCI segment 0 is now known early which made the sanity check on DRHD definition structure to work. This, in turn, caused a regression on some machines (in particular, HP PowerEdge R740 with I/O AT DMA disabled) where IOMMU was explicitly disabled due to some internal PCI devices being non-discoverable but present in DMAR. While this is indeed a BIOS mistake it seems to be not that critical to disable the whole IOMMU. Instead, extend the scope of "workaround_bios_bug" option and make it enabled by default. This is consistent with our documentation and actually what a user might expect from an option with that name. It also doesn't seem safe to simply ignore DRHD without initialization so remove this case. But leave the original DMAR check in place to still allow error reporting. Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> --- docs/misc/xen-command-line.pandoc | 2 +- xen/drivers/passthrough/iommu.c | 2 +- xen/drivers/passthrough/vtd/dmar.c | 25 ++++++------------------- 3 files changed, 8 insertions(+), 21 deletions(-)