Message ID | 20170811164320.92899-2-roger.pau@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> From: Roger Pau Monne > Sent: Saturday, August 12, 2017 12:43 AM > > They are emulated by Xen, so they must not be mapped into Dom0 p2m. > Introduce a helper function to add the MMCFG areas to the list of > denied iomem regions for PVH Dom0. > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> this patch is a general fix, not just for inclusive mapping. please send it separately. > --- > Cc: Jan Beulich <jbeulich@suse.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > --- > Changes since RFC: > - Introduce as helper instead of exposing the internal mmcfg > variables to the Dom0 builder. > --- > xen/arch/x86/dom0_build.c | 4 ++++ > xen/arch/x86/x86_64/mmconfig_64.c | 21 +++++++++++++++++++++ > xen/include/xen/pci.h | 2 ++ > 3 files changed, 27 insertions(+) > > diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c > index 0c125e61eb..3e0910d779 100644 > --- a/xen/arch/x86/dom0_build.c > +++ b/xen/arch/x86/dom0_build.c > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain > *d) > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > } > > + /* For PVH prevent access to the MMCFG areas. */ > + if ( dom0_pvh ) > + rc |= pci_mmcfg_set_domain_permissions(d); > + > return rc; > } > > diff --git a/xen/arch/x86/x86_64/mmconfig_64.c > b/xen/arch/x86/x86_64/mmconfig_64.c > index e84a67dfc4..271fad407f 100644 > --- a/xen/arch/x86/x86_64/mmconfig_64.c > +++ b/xen/arch/x86/x86_64/mmconfig_64.c > @@ -15,6 +15,8 @@ > #include <xen/pci_regs.h> > #include <xen/iommu.h> > #include <xen/rangeset.h> > +#include <xen/sched.h> > +#include <xen/iocap.h> > > #include "mmconfig.h" > > @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx) > cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number); > } > > +int pci_mmcfg_set_domain_permissions(struct domain *d) > +{ > + unsigned int idx; > + int rc = 0; > + > + for ( idx = 0; idx < pci_mmcfg_config_num; idx++ ) > + { > + const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg; > + unsigned long start = PFN_DOWN(cfg->address) + > + PCI_BDF(cfg->start_bus_number, 0, 0); > + unsigned long end = PFN_DOWN(cfg->address) + > + PCI_BDF(cfg->end_bus_number, ~0, ~0); > + > + rc |= iomem_deny_access(d, start, end); > + } > + > + return rc; > +} > + > bool_t pci_mmcfg_decode(unsigned long mfn, unsigned int *seg, > unsigned int *bdf) > { > diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h > index 59b6e8a81c..ea6a66b248 100644 > --- a/xen/include/xen/pci.h > +++ b/xen/include/xen/pci.h > @@ -170,4 +170,6 @@ int msixtbl_pt_register(struct domain *, struct pirq > *, uint64_t gtable); > void msixtbl_pt_unregister(struct domain *, struct pirq *); > void msixtbl_pt_cleanup(struct domain *d); > > +int pci_mmcfg_set_domain_permissions(struct domain *d); > + > #endif /* __XEN_PCI_H__ */ > -- > 2.11.0 (Apple Git-81) > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
On Thu, Aug 17, 2017 at 03:12:02AM +0000, Tian, Kevin wrote: > > From: Roger Pau Monne > > Sent: Saturday, August 12, 2017 12:43 AM > > > > They are emulated by Xen, so they must not be mapped into Dom0 p2m. > > Introduce a helper function to add the MMCFG areas to the list of > > denied iomem regions for PVH Dom0. > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > this patch is a general fix, not just for inclusive mapping. please send > it separately. Hm, not really. PV Dom0 should have access to the MMCFG areas, PVH Dom0 shouldn't because they will emulated by Xen. So far MMCFG areas are not mapped into PVH Dom0 p2m, but they will be once iommu_inclusive_mapping is implemented for PVH Dom0. So I consider this a preparatory change before enabling iommu_inclusive_mapping for PVH, rather than a fix. It would be a fix if iommu_inclusive_mapping was already enabled for PVH Dom0. Roger.
>>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: > They are emulated by Xen, so they must not be mapped into Dom0 p2m. > Introduce a helper function to add the MMCFG areas to the list of > denied iomem regions for PVH Dom0. "They are" or "They are going to be"? > --- a/xen/arch/x86/dom0_build.c > +++ b/xen/arch/x86/dom0_build.c > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d) > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > } > > + /* For PVH prevent access to the MMCFG areas. */ > + if ( dom0_pvh ) > + rc |= pci_mmcfg_set_domain_permissions(d); What about ones reported by Dom0 later on? Which then raises the question whether ... > @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx) > cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number); > } > > +int pci_mmcfg_set_domain_permissions(struct domain *d) > +{ > + unsigned int idx; > + int rc = 0; > + > + for ( idx = 0; idx < pci_mmcfg_config_num; idx++ ) > + { > + const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg; > + unsigned long start = PFN_DOWN(cfg->address) + > + PCI_BDF(cfg->start_bus_number, 0, 0); > + unsigned long end = PFN_DOWN(cfg->address) + > + PCI_BDF(cfg->end_bus_number, ~0, ~0); > + > + rc |= iomem_deny_access(d, start, end); ... this shouldn't be unnecessary by, other than PV Dom0, starting out with no I/O memory being made accessible (i.e. white listing just like we decided we would do for other properties for PVH). Additionally while in the code that dom0_setup_permissions() was broken out from using |= was fine, there and here it's not really appropriate unless we want to continue to bake in the assumption that either iomem_deny_access() can only ever return a single error indicator or (b) the callers only care about the value being (non-)zero. Jan
On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: > >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: > > They are emulated by Xen, so they must not be mapped into Dom0 p2m. > > Introduce a helper function to add the MMCFG areas to the list of > > denied iomem regions for PVH Dom0. > > "They are" or "They are going to be"? This started as a series on top of vPCI, but I think it has a chance of getting in before vPCI. I will change it. > > --- a/xen/arch/x86/dom0_build.c > > +++ b/xen/arch/x86/dom0_build.c > > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d) > > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > > } > > > > + /* For PVH prevent access to the MMCFG areas. */ > > + if ( dom0_pvh ) > > + rc |= pci_mmcfg_set_domain_permissions(d); > > What about ones reported by Dom0 later on? Which then raises the > question whether ... This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved handler. But since you propose to do white listing, I guess it doesn't matter that much anymore. > > @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx) > > cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number); > > } > > > > +int pci_mmcfg_set_domain_permissions(struct domain *d) > > +{ > > + unsigned int idx; > > + int rc = 0; > > + > > + for ( idx = 0; idx < pci_mmcfg_config_num; idx++ ) > > + { > > + const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg; > > + unsigned long start = PFN_DOWN(cfg->address) + > > + PCI_BDF(cfg->start_bus_number, 0, 0); > > + unsigned long end = PFN_DOWN(cfg->address) + > > + PCI_BDF(cfg->end_bus_number, ~0, ~0); > > + > > + rc |= iomem_deny_access(d, start, end); > > ... this shouldn't be unnecessary by, other than PV Dom0, > starting out with no I/O memory being made accessible (i.e. > white listing just like we decided we would do for other > properties for PVH). So would you like to switch to this white listing mode even for PV Dom0, or just for PVH? Should reserved regions and holes be added to it? Maybe only reserved regions? > Additionally while in the code that dom0_setup_permissions() > was broken out from using |= was fine, there and here it's not > really appropriate unless we want to continue to bake in the > assumption that either iomem_deny_access() can only ever > return a single error indicator or (b) the callers only care about > the value being (non-)zero. Right, I can fix that. Thanks, Roger.
>>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: >> > They are emulated by Xen, so they must not be mapped into Dom0 p2m. >> > Introduce a helper function to add the MMCFG areas to the list of >> > denied iomem regions for PVH Dom0. >> >> "They are" or "They are going to be"? > > This started as a series on top of vPCI, but I think it has a chance > of getting in before vPCI. I will change it. I guessed this would be the reason, but while reviewing the vPCI series you've said somewhere functionality from the series here would be implied. >> > --- a/xen/arch/x86/dom0_build.c >> > +++ b/xen/arch/x86/dom0_build.c >> > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d) >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); >> > } >> > >> > + /* For PVH prevent access to the MMCFG areas. */ >> > + if ( dom0_pvh ) >> > + rc |= pci_mmcfg_set_domain_permissions(d); >> >> What about ones reported by Dom0 later on? Which then raises the >> question whether ... > > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved handler. > But since you propose to do white listing, I guess it doesn't matter > that much anymore. Well, a fundamental question is whether white listing would work in the first place. I could see room for severe problems e.g. with ACPI methods wanting to access MMIO that's not described by any PCI devices' BARs. Typically that would be regions in the chipset which firmware is responsible for configuring/managing, the addresses of which can be found/set in custom config space registers. >> > @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx) >> > cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number); >> > } >> > >> > +int pci_mmcfg_set_domain_permissions(struct domain *d) >> > +{ >> > + unsigned int idx; >> > + int rc = 0; >> > + >> > + for ( idx = 0; idx < pci_mmcfg_config_num; idx++ ) >> > + { >> > + const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg; >> > + unsigned long start = PFN_DOWN(cfg->address) + >> > + PCI_BDF(cfg->start_bus_number, 0, 0); >> > + unsigned long end = PFN_DOWN(cfg->address) + >> > + PCI_BDF(cfg->end_bus_number, ~0, ~0); >> > + >> > + rc |= iomem_deny_access(d, start, end); >> >> ... this shouldn't be unnecessary by, other than PV Dom0, >> starting out with no I/O memory being made accessible (i.e. >> white listing just like we decided we would do for other >> properties for PVH). > > So would you like to switch to this white listing mode even for PV > Dom0, or just for PVH? No, I certainly don't think we should touch PV here. > Should reserved regions and holes be added to it? Maybe only reserved > regions? See above - reserved regions may be a minimum that's needed to be added, but then again we can't be certain all BIOSes properly report everything in use by the chipset/firmware as reserved. Otoh they're called reserved because no-one outside of the firmware should touch them. Jan
On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: > >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: > > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: > >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: > >> > --- a/xen/arch/x86/dom0_build.c > >> > +++ b/xen/arch/x86/dom0_build.c > >> > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d) > >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > >> > } > >> > > >> > + /* For PVH prevent access to the MMCFG areas. */ > >> > + if ( dom0_pvh ) > >> > + rc |= pci_mmcfg_set_domain_permissions(d); > >> > >> What about ones reported by Dom0 later on? Which then raises the > >> question whether ... > > > > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved handler. > > But since you propose to do white listing, I guess it doesn't matter > > that much anymore. > > Well, a fundamental question is whether white listing would work in > the first place. I could see room for severe problems e.g. with ACPI > methods wanting to access MMIO that's not described by any PCI > devices' BARs. Typically that would be regions in the chipset which > firmware is responsible for configuring/managing, the addresses of > which can be found/set in custom config space registers. The question would also be what would Xen allow in such white-listing. Obviously you can get to map the same using both white-list and black-listing (see below). > >> > @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx) > >> > cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number); > >> > } > >> > > >> > +int pci_mmcfg_set_domain_permissions(struct domain *d) > >> > +{ > >> > + unsigned int idx; > >> > + int rc = 0; > >> > + > >> > + for ( idx = 0; idx < pci_mmcfg_config_num; idx++ ) > >> > + { > >> > + const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg; > >> > + unsigned long start = PFN_DOWN(cfg->address) + > >> > + PCI_BDF(cfg->start_bus_number, 0, 0); > >> > + unsigned long end = PFN_DOWN(cfg->address) + > >> > + PCI_BDF(cfg->end_bus_number, ~0, ~0); > >> > + > >> > + rc |= iomem_deny_access(d, start, end); > >> > >> ... this shouldn't be unnecessary by, other than PV Dom0, > >> starting out with no I/O memory being made accessible (i.e. > >> white listing just like we decided we would do for other > >> properties for PVH). > > > > So would you like to switch to this white listing mode even for PV > > Dom0, or just for PVH? > > No, I certainly don't think we should touch PV here. > > > Should reserved regions and holes be added to it? Maybe only reserved > > regions? > > See above - reserved regions may be a minimum that's needed to > be added, but then again we can't be certain all BIOSes properly > report everything in use by the chipset/firmware as reserved. Otoh > they're called reserved because no-one outside of the firmware > should touch them. Right. On a more general comment I can see your suspicions on this series, TBH I don't like to implement something like this either. This series just paper over an issue in either the VT-d IOMMU implementation in Xen, or a hardware errata in some IOMMUs found on older hardware. Having that said, I've tested now a slightly less intrusive variant, which only maps reserved regions. This will still require Xen to blacklsit the MMCFG regions, which reside in reserved areas. Is there anything else Xen should blacklist from reserved regions? Roger.
>>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote: > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: >> >> > --- a/xen/arch/x86/dom0_build.c >> >> > +++ b/xen/arch/x86/dom0_build.c >> >> > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d) >> >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); >> >> > } >> >> > >> >> > + /* For PVH prevent access to the MMCFG areas. */ >> >> > + if ( dom0_pvh ) >> >> > + rc |= pci_mmcfg_set_domain_permissions(d); >> >> >> >> What about ones reported by Dom0 later on? Which then raises the >> >> question whether ... >> > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved handler. >> > But since you propose to do white listing, I guess it doesn't matter >> > that much anymore. >> >> Well, a fundamental question is whether white listing would work in >> the first place. I could see room for severe problems e.g. with ACPI >> methods wanting to access MMIO that's not described by any PCI >> devices' BARs. Typically that would be regions in the chipset which >> firmware is responsible for configuring/managing, the addresses of >> which can be found/set in custom config space registers. > > The question would also be what would Xen allow in such white-listing. > Obviously you can get to map the same using both white-list and > black-listing (see below). Not really - what you've said there regarding MMCFG regions is a clear indication that we should _not_ map reserved regions, i.e. it would need to be full white listing with perhaps just the PCI device BARs being handled automatically. Jan
On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote: > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote: > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: > >> >> > --- a/xen/arch/x86/dom0_build.c > >> >> > +++ b/xen/arch/x86/dom0_build.c > >> >> > @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d) > >> >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > >> >> > } > >> >> > > >> >> > + /* For PVH prevent access to the MMCFG areas. */ > >> >> > + if ( dom0_pvh ) > >> >> > + rc |= pci_mmcfg_set_domain_permissions(d); > >> >> > >> >> What about ones reported by Dom0 later on? Which then raises the > >> >> question whether ... > >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved handler. > >> > But since you propose to do white listing, I guess it doesn't matter > >> > that much anymore. > >> > >> Well, a fundamental question is whether white listing would work in > >> the first place. I could see room for severe problems e.g. with ACPI > >> methods wanting to access MMIO that's not described by any PCI > >> devices' BARs. Typically that would be regions in the chipset which > >> firmware is responsible for configuring/managing, the addresses of > >> which can be found/set in custom config space registers. > > > > The question would also be what would Xen allow in such white-listing. > > Obviously you can get to map the same using both white-list and > > black-listing (see below). > > Not really - what you've said there regarding MMCFG regions is > a clear indication that we should _not_ map reserved regions, i.e. > it would need to be full white listing with perhaps just the PCI > device BARs being handled automatically. I've tried just mapping the BARs and that sadly doesn't work, the box hangs after the IOMMU is enabled: [...] (XEN) [VT-D]d0:PCI: map 0000:3f:13.5 (XEN) [VT-D]d0:PCI: map 0000:3f:13.6 (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 I will park this ATM and leave it for the Intel guys to diagnose. For the reference, the specific box I'm testing ATM has a Xeon(R) CPU E5-1607 0 @ 3.00GHz and a C600/X79 chipset. Roger.
> From: Roger Pau Monne [mailto:roger.pau@citrix.com] > Sent: Thursday, August 17, 2017 5:32 PM > > On Thu, Aug 17, 2017 at 03:12:02AM +0000, Tian, Kevin wrote: > > > From: Roger Pau Monne > > > Sent: Saturday, August 12, 2017 12:43 AM > > > > > > They are emulated by Xen, so they must not be mapped into Dom0 p2m. > > > Introduce a helper function to add the MMCFG areas to the list of > > > denied iomem regions for PVH Dom0. > > > > > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > > > this patch is a general fix, not just for inclusive mapping. please send > > it separately. > > Hm, not really. > > PV Dom0 should have access to the MMCFG areas, PVH Dom0 shouldn't > because they will emulated by Xen. > > So far MMCFG areas are not mapped into PVH Dom0 p2m, but they will be > once iommu_inclusive_mapping is implemented for PVH Dom0. So I > consider this a preparatory change before enabling > iommu_inclusive_mapping for PVH, rather than a fix. It would be a > fix if iommu_inclusive_mapping was already enabled for PVH Dom0. > Possibly you need a better description here. otherwise current description has nothing to do with inclusive mapping, based on which it looks a basic PVH dom0 problem (while from your explanation it's not valid today). Thanks Kevin
> From: Roger Pau Monne [mailto:roger.pau@citrix.com] > Sent: Friday, August 25, 2017 9:59 PM > > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote: > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote: > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: > > >> >> > --- a/xen/arch/x86/dom0_build.c > > >> >> > +++ b/xen/arch/x86/dom0_build.c > > >> >> > @@ -440,6 +440,10 @@ int __init > dom0_setup_permissions(struct domain *d) > > >> >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > > >> >> > } > > >> >> > > > >> >> > + /* For PVH prevent access to the MMCFG areas. */ > > >> >> > + if ( dom0_pvh ) > > >> >> > + rc |= pci_mmcfg_set_domain_permissions(d); > > >> >> > > >> >> What about ones reported by Dom0 later on? Which then raises the > > >> >> question whether ... > > >> > > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved > handler. > > >> > But since you propose to do white listing, I guess it doesn't matter > > >> > that much anymore. > > >> > > >> Well, a fundamental question is whether white listing would work in > > >> the first place. I could see room for severe problems e.g. with ACPI > > >> methods wanting to access MMIO that's not described by any PCI > > >> devices' BARs. Typically that would be regions in the chipset which > > >> firmware is responsible for configuring/managing, the addresses of > > >> which can be found/set in custom config space registers. > > > > > > The question would also be what would Xen allow in such white-listing. > > > Obviously you can get to map the same using both white-list and > > > black-listing (see below). > > > > Not really - what you've said there regarding MMCFG regions is > > a clear indication that we should _not_ map reserved regions, i.e. > > it would need to be full white listing with perhaps just the PCI > > device BARs being handled automatically. > > I've tried just mapping the BARs and that sadly doesn't work, the box > hangs after the IOMMU is enabled: > > [...] > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5 > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6 > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 > > I will park this ATM and leave it for the Intel guys to diagnose. > > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU > E5-1607 0 @ 3.00GHz and a C600/X79 chipset. > +Chao who can help check whether we have such a box at hand. btw please also give your BIOS version. Thanks kevin
On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote: > > From: Roger Pau Monne [mailto:roger.pau@citrix.com] > > Sent: Friday, August 25, 2017 9:59 PM > > > > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote: > > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote: > > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: > > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: > > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: > > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: > > > >> >> > --- a/xen/arch/x86/dom0_build.c > > > >> >> > +++ b/xen/arch/x86/dom0_build.c > > > >> >> > @@ -440,6 +440,10 @@ int __init > > dom0_setup_permissions(struct domain *d) > > > >> >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > > > >> >> > } > > > >> >> > > > > >> >> > + /* For PVH prevent access to the MMCFG areas. */ > > > >> >> > + if ( dom0_pvh ) > > > >> >> > + rc |= pci_mmcfg_set_domain_permissions(d); > > > >> >> > > > >> >> What about ones reported by Dom0 later on? Which then raises the > > > >> >> question whether ... > > > >> > > > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved > > handler. > > > >> > But since you propose to do white listing, I guess it doesn't matter > > > >> > that much anymore. > > > >> > > > >> Well, a fundamental question is whether white listing would work in > > > >> the first place. I could see room for severe problems e.g. with ACPI > > > >> methods wanting to access MMIO that's not described by any PCI > > > >> devices' BARs. Typically that would be regions in the chipset which > > > >> firmware is responsible for configuring/managing, the addresses of > > > >> which can be found/set in custom config space registers. > > > > > > > > The question would also be what would Xen allow in such white-listing. > > > > Obviously you can get to map the same using both white-list and > > > > black-listing (see below). > > > > > > Not really - what you've said there regarding MMCFG regions is > > > a clear indication that we should _not_ map reserved regions, i.e. > > > it would need to be full white listing with perhaps just the PCI > > > device BARs being handled automatically. > > > > I've tried just mapping the BARs and that sadly doesn't work, the box > > hangs after the IOMMU is enabled: > > > > [...] > > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5 > > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6 > > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 > > > > I will park this ATM and leave it for the Intel guys to diagnose. > > > > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU > > E5-1607 0 @ 3.00GHz and a C600/X79 chipset. > > > > +Chao who can help check whether we have such a box at hand. > > btw please also give your BIOS version. It's a Precision T3600 BIOS A14. Thanks, Roger.
On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote: >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote: >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com] >> > Sent: Friday, August 25, 2017 9:59 PM >> > >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote: >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote: >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: >> > > >> >> > --- a/xen/arch/x86/dom0_build.c >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c >> > > >> >> > @@ -440,6 +440,10 @@ int __init >> > dom0_setup_permissions(struct domain *d) >> > > >> >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); >> > > >> >> > } >> > > >> >> > >> > > >> >> > + /* For PVH prevent access to the MMCFG areas. */ >> > > >> >> > + if ( dom0_pvh ) >> > > >> >> > + rc |= pci_mmcfg_set_domain_permissions(d); >> > > >> >> >> > > >> >> What about ones reported by Dom0 later on? Which then raises the >> > > >> >> question whether ... >> > > >> > >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved >> > handler. >> > > >> > But since you propose to do white listing, I guess it doesn't matter >> > > >> > that much anymore. >> > > >> >> > > >> Well, a fundamental question is whether white listing would work in >> > > >> the first place. I could see room for severe problems e.g. with ACPI >> > > >> methods wanting to access MMIO that's not described by any PCI >> > > >> devices' BARs. Typically that would be regions in the chipset which >> > > >> firmware is responsible for configuring/managing, the addresses of >> > > >> which can be found/set in custom config space registers. >> > > > >> > > > The question would also be what would Xen allow in such white-listing. >> > > > Obviously you can get to map the same using both white-list and >> > > > black-listing (see below). >> > > >> > > Not really - what you've said there regarding MMCFG regions is >> > > a clear indication that we should _not_ map reserved regions, i.e. >> > > it would need to be full white listing with perhaps just the PCI >> > > device BARs being handled automatically. >> > >> > I've tried just mapping the BARs and that sadly doesn't work, the box >> > hangs after the IOMMU is enabled: >> > >> > [...] >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5 >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6 >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 >> > >> > I will park this ATM and leave it for the Intel guys to diagnose. >> > >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset. >> > >> >> +Chao who can help check whether we have such a box at hand. >> >> btw please also give your BIOS version. > >It's a Precision T3600 BIOS A14. Hi, Roger. I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and the bug didn't occur on this box. The log is below: XEN) [ 7.509588] [VT-D]d0:PCIe: map 0000:ff:1e.2 (XEN) [ 7.511047] [VT-D]d0:PCIe: map 0000:ff:1e.3 (XEN) [ 7.512463] [VT-D]d0:PCIe: map 0000:ff:1e.4 (XEN) [ 7.513927] [VT-D]d0:PCIe: map 0000:ff:1e.5 (XEN) [ 7.515342] [VT-D]d0:PCIe: map 0000:ff:1e.6 (XEN) [ 7.516808] [VT-D]d0:PCIe: map 0000:ff:1e.7 (XEN) [ 7.519449] [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 (XEN) [ 7.522295] [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021d000 (XEN) [ 8.675096] OS: linux version: 2.6 loader: generic bitness: 64-bit (XEN) [ 8.726763] (XEN) [ 8.730171] **************************************** (XEN) [ 8.737491] Panic on CPU 0: (XEN) [ 8.742376] Building a PVHv2 Dom0 is not yet supported. (XEN) [ 8.750148] **************************************** (XEN) [ 8.757457] (XEN) [ 8.760877] Reboot in five seconds... (XEN) [ 13.769050] Resetting with ACPI MEMORY or I/O RESET_REG I agree with you that there may be some bugs in firmware and VT-d. I did trials on a haswell box with iommu_inclusive_mapping=false. I did see DMA traslation fault. The address to be translated is reserved in e820 but isn't included in RMRR. Even that, the box booted up successfully. But if the bug exists in pvh dom0, it also exists in pv dom0. Could you try to boot with pv dom0 and set iommu_inclusive_mapping=false? Theoretically, the system would halt exactly like what you met in pvh dom0. Is that right? or I miss some difference between pvh dom0 and pv dom0. Thanks Chao
On Thu, Aug 31, 2017 at 10:03:19AM +0100, Roger Pau Monne wrote: >On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote: >> On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote: >> >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote: >> >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com] >> >> > Sent: Friday, August 25, 2017 9:59 PM >> >> > >> >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote: >> >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote: >> >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: >> >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: >> >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: >> >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: >> >> > > >> >> > --- a/xen/arch/x86/dom0_build.c >> >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c >> >> > > >> >> > @@ -440,6 +440,10 @@ int __init >> >> > dom0_setup_permissions(struct domain *d) >> >> > > >> >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); >> >> > > >> >> > } >> >> > > >> >> > >> >> > > >> >> > + /* For PVH prevent access to the MMCFG areas. */ >> >> > > >> >> > + if ( dom0_pvh ) >> >> > > >> >> > + rc |= pci_mmcfg_set_domain_permissions(d); >> >> > > >> >> >> >> > > >> >> What about ones reported by Dom0 later on? Which then raises the >> >> > > >> >> question whether ... >> >> > > >> > >> >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved >> >> > handler. >> >> > > >> > But since you propose to do white listing, I guess it doesn't matter >> >> > > >> > that much anymore. >> >> > > >> >> >> > > >> Well, a fundamental question is whether white listing would work in >> >> > > >> the first place. I could see room for severe problems e.g. with ACPI >> >> > > >> methods wanting to access MMIO that's not described by any PCI >> >> > > >> devices' BARs. Typically that would be regions in the chipset which >> >> > > >> firmware is responsible for configuring/managing, the addresses of >> >> > > >> which can be found/set in custom config space registers. >> >> > > > >> >> > > > The question would also be what would Xen allow in such white-listing. >> >> > > > Obviously you can get to map the same using both white-list and >> >> > > > black-listing (see below). >> >> > > >> >> > > Not really - what you've said there regarding MMCFG regions is >> >> > > a clear indication that we should _not_ map reserved regions, i.e. >> >> > > it would need to be full white listing with perhaps just the PCI >> >> > > device BARs being handled automatically. >> >> > >> >> > I've tried just mapping the BARs and that sadly doesn't work, the box >> >> > hangs after the IOMMU is enabled: >> >> > >> >> > [...] >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5 >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6 >> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 >> >> > >> >> > I will park this ATM and leave it for the Intel guys to diagnose. >> >> > >> >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU >> >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset. >> >> > >> >> >> >> +Chao who can help check whether we have such a box at hand. >> >> >> >> btw please also give your BIOS version. >> > >> >It's a Precision T3600 BIOS A14. >> >> Hi, Roger. >> >> I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and > >The ones I've seen issues with are Sandy Bridge or Nehalem, can you >find some of this hardware? As I expected, I was removed from recipents :(, which made me hard to notice your replies in time. Yes. I will. But may take some time (for even Ivy Bridge is rare). > >I haven't tested Ivy Bridge, but all Haswell boxes I've tested seem to >work just fine. The reason why I chose Ivy Bridge partly is you said you found this bug on almost pre-haswell box. Thanks Chao
On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote: > On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote: > >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote: > >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com] > >> > Sent: Friday, August 25, 2017 9:59 PM > >> > > >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote: > >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote: > >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: > >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: > >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: > >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: > >> > > >> >> > --- a/xen/arch/x86/dom0_build.c > >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c > >> > > >> >> > @@ -440,6 +440,10 @@ int __init > >> > dom0_setup_permissions(struct domain *d) > >> > > >> >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > >> > > >> >> > } > >> > > >> >> > > >> > > >> >> > + /* For PVH prevent access to the MMCFG areas. */ > >> > > >> >> > + if ( dom0_pvh ) > >> > > >> >> > + rc |= pci_mmcfg_set_domain_permissions(d); > >> > > >> >> > >> > > >> >> What about ones reported by Dom0 later on? Which then raises the > >> > > >> >> question whether ... > >> > > >> > > >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved > >> > handler. > >> > > >> > But since you propose to do white listing, I guess it doesn't matter > >> > > >> > that much anymore. > >> > > >> > >> > > >> Well, a fundamental question is whether white listing would work in > >> > > >> the first place. I could see room for severe problems e.g. with ACPI > >> > > >> methods wanting to access MMIO that's not described by any PCI > >> > > >> devices' BARs. Typically that would be regions in the chipset which > >> > > >> firmware is responsible for configuring/managing, the addresses of > >> > > >> which can be found/set in custom config space registers. > >> > > > > >> > > > The question would also be what would Xen allow in such white-listing. > >> > > > Obviously you can get to map the same using both white-list and > >> > > > black-listing (see below). > >> > > > >> > > Not really - what you've said there regarding MMCFG regions is > >> > > a clear indication that we should _not_ map reserved regions, i.e. > >> > > it would need to be full white listing with perhaps just the PCI > >> > > device BARs being handled automatically. > >> > > >> > I've tried just mapping the BARs and that sadly doesn't work, the box > >> > hangs after the IOMMU is enabled: > >> > > >> > [...] > >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5 > >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6 > >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 > >> > > >> > I will park this ATM and leave it for the Intel guys to diagnose. > >> > > >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU > >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset. > >> > > >> > >> +Chao who can help check whether we have such a box at hand. > >> > >> btw please also give your BIOS version. > > > >It's a Precision T3600 BIOS A14. > > Hi, Roger. > > I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and > the bug didn't occur on this box. The log is below: > XEN) [ 7.509588] [VT-D]d0:PCIe: map 0000:ff:1e.2 > (XEN) [ 7.511047] [VT-D]d0:PCIe: map 0000:ff:1e.3 > (XEN) [ 7.512463] [VT-D]d0:PCIe: map 0000:ff:1e.4 > (XEN) [ 7.513927] [VT-D]d0:PCIe: map 0000:ff:1e.5 > (XEN) [ 7.515342] [VT-D]d0:PCIe: map 0000:ff:1e.6 > (XEN) [ 7.516808] [VT-D]d0:PCIe: map 0000:ff:1e.7 > (XEN) [ 7.519449] [VT-D]iommu_enable_translation: iommu->reg = > ffff82c00021b000 > (XEN) [ 7.522295] [VT-D]iommu_enable_translation: iommu->reg = > ffff82c00021d000 > (XEN) [ 8.675096] OS: linux version: 2.6 loader: generic bitness: > 64-bit > (XEN) [ 8.726763] > (XEN) [ 8.730171] **************************************** > (XEN) [ 8.737491] Panic on CPU 0: > (XEN) [ 8.742376] Building a PVHv2 Dom0 is not yet supported. > (XEN) [ 8.750148] **************************************** > (XEN) [ 8.757457] > (XEN) [ 8.760877] Reboot in five seconds... > (XEN) [ 13.769050] Resetting with ACPI MEMORY or I/O RESET_REG > > I agree with you that there may be some bugs in firmware and VT-d. > I did trials on a haswell box with iommu_inclusive_mapping=false. I did > see DMA traslation fault. The address to be translated is reserved in > e820 but isn't included in RMRR. Even that, the box booted up > successfully. > > But if the bug exists in pvh dom0, it also exists in pv dom0. Could you > try to boot with pv dom0 and set iommu_inclusive_mapping=false? > Theoretically, the system would halt exactly like what you met in > pvh dom0. Is that right? or I miss some difference between pvh dom0 and > pv dom0. Yes, the same happens with iommu_incluse_mapping=false on PV, the issue is not PVH specific. Here is the full dmesg: Xen 4.10-unstable (XEN) Xen version 4.10-unstable (root@) (FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLVM 4.0.0)) debug=y Thu Aug 31 09:47:55 BST 2017 (XEN) Latest ChangeSet: (XEN) Console output is synchronous. (XEN) Bootloader: FreeBSD Loader (XEN) Command line: dom0_mem=4096M com1=115200,8n1 console=com1,vga guest_loglvl=all loglvl=all iommu=debug,verbose sync_console watchdog iommu_inclusive_mapping=false (XEN) Xen image load base address: 0 (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: V2; EDID transfer time: 1 seconds (XEN) Disc information: (XEN) Found 2 MBR signatures (XEN) Found 2 EDD information structures (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000008dc00 (usable) (XEN) 000000000008dc00 - 00000000000a0000 (reserved) (XEN) 00000000000e0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 0000000018ebb000 (usable) (XEN) 0000000018ebb000 - 0000000018fe8000 (ACPI NVS) (XEN) 0000000018fe8000 - 0000000018fe9000 (usable) (XEN) 0000000018fe9000 - 0000000019000000 (ACPI NVS) (XEN) 0000000019000000 - 000000001dffd000 (usable) (XEN) 000000001dffd000 - 000000001e000000 (ACPI data) (XEN) 000000001e000000 - 00000000ac784000 (usable) (XEN) 00000000ac784000 - 00000000ac818000 (reserved) (XEN) 00000000ac818000 - 00000000ad800000 (usable) (XEN) 00000000b0000000 - 00000000b4000000 (reserved) (XEN) 00000000fed20000 - 00000000fed40000 (reserved) (XEN) 00000000fed50000 - 00000000fed90000 (reserved) (XEN) 00000000ffa00000 - 00000000ffa40000 (reserved) (XEN) 0000000100000000 - 0000000250000000 (usable) (XEN) New Xen image base address: 0xad200000 (XEN) ACPI: RSDP 000FE300, 0024 (r2 DELL ) (XEN) ACPI: XSDT 1DFFEE18, 0074 (r1 DELL CBX3 6222004 MSFT 10013) (XEN) ACPI: FACP 18FEFD98, 00F4 (r4 DELL CBX3 6222004 MSFT 10013) (XEN) ACPI: DSDT 18FA9018, 6373 (r1 DELL CBX3 0 INTL 20091112) (XEN) ACPI: FACS 18FF1F40, 0040 (XEN) ACPI: APIC 1DFFDC18, 0158 (r2 DELL CBX3 6222004 MSFT 10013) (XEN) ACPI: MCFG 18FFED18, 003C (r1 A M I OEMMCFG. 6222004 MSFT 97) (XEN) ACPI: TCPA 18FFEC98, 0032 (r2 0 0) (XEN) ACPI: SSDT 18FF0A98, 0306 (r1 DELLTP TPM 3000 INTL 20091112) (XEN) ACPI: HPET 18FFEC18, 0038 (r1 A M I PCHHPET 6222004 AMI. 3) (XEN) ACPI: BOOT 18FFEB98, 0028 (r1 DELL CBX3 6222004 AMI 10013) (XEN) ACPI: SSDT 18FB0018, 36FFE (r2 INTEL CpuPm 4000 INTL 20091112) (XEN) ACPI: SLIC 18FEEC18, 0176 (r3 DELL CBX3 6222004 MSFT 10013) (XEN) ACPI: DMAR 18FF1B18, 0094 (r1 A M I OEMDMAR 1 INTL 1) (XEN) System RAM: 8149MB (8345288kB) (XEN) No NUMA configuration found (XEN) Faking a node at 0000000000000000-0000000250000000 (XEN) Domain heap initialised (XEN) CPU Vendor: Intel, Family 6 (0x6), Model 45 (0x2d), Stepping 7 (raw 000206d7) (XEN) found SMP MP-table at 000f1db0 (XEN) DMI 2.6 present. (XEN) Using APIC driver default (XEN) ACPI: PM-Timer IO Port: 0x408 (32 bits) (XEN) ACPI: SLEEP INFO: pm1x_cnt[1:404,1:0], pm1x_evt[1:400,1:0] (XEN) ACPI: 32/64X FACS address mismatch in FADT - 18ffdf40/0000000018ff1f40, using 32 (XEN) ACPI: wakeup_vec[18ffdf4c], vec_size[20] (XEN) ACPI: Local APIC address 0xfee00000 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x04] enabled) (XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x06] enabled) (XEN) ACPI: LAPIC (acpi_id[0x05] lapic_id[0x04] disabled) (XEN) ACPI: LAPIC (acpi_id[0x06] lapic_id[0x05] disabled) (XEN) ACPI: LAPIC (acpi_id[0x07] lapic_id[0x06] disabled) (XEN) ACPI: LAPIC (acpi_id[0x08] lapic_id[0x07] disabled) (XEN) ACPI: LAPIC (acpi_id[0x09] lapic_id[0x08] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0a] lapic_id[0x09] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0b] lapic_id[0x0a] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x0b] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x0c] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0e] lapic_id[0x0d] disabled) (XEN) ACPI: LAPIC (acpi_id[0x0f] lapic_id[0x0e] disabled) (XEN) ACPI: LAPIC (acpi_id[0x10] lapic_id[0x0f] disabled) (XEN) ACPI: LAPIC (acpi_id[0x11] lapic_id[0x10] disabled) (XEN) ACPI: LAPIC (acpi_id[0x12] lapic_id[0x11] disabled) (XEN) ACPI: LAPIC (acpi_id[0x13] lapic_id[0x12] disabled) (XEN) ACPI: LAPIC (acpi_id[0x14] lapic_id[0x13] disabled) (XEN) ACPI: LAPIC (acpi_id[0x15] lapic_id[0x14] disabled) (XEN) ACPI: LAPIC (acpi_id[0x16] lapic_id[0x15] disabled) (XEN) ACPI: LAPIC (acpi_id[0x17] lapic_id[0x16] disabled) (XEN) ACPI: LAPIC (acpi_id[0x18] lapic_id[0x17] disabled) (XEN) ACPI: LAPIC (acpi_id[0x19] lapic_id[0x18] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1a] lapic_id[0x19] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1b] lapic_id[0x1a] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1c] lapic_id[0x1b] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1d] lapic_id[0x1c] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1e] lapic_id[0x1d] disabled) (XEN) ACPI: LAPIC (acpi_id[0x1f] lapic_id[0x1e] disabled) (XEN) ACPI: LAPIC (acpi_id[0x20] lapic_id[0x1f] disabled) (XEN) ACPI: IOAPIC (id[0x00] address[0xfec00000] gsi_base[0]) (XEN) IOAPIC[0]: apic_id 0, version 32, address 0xfec00000, GSI 0-23 (XEN) ACPI: IOAPIC (id[0x02] address[0xfec3f000] gsi_base[24]) (XEN) IOAPIC[1]: apic_id 2, version 32, address 0xfec3f000, GSI 24-47 (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) (XEN) ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) (XEN) ACPI: IRQ0 used by override. (XEN) ACPI: IRQ2 used by override. (XEN) ACPI: IRQ9 used by override. (XEN) Enabling APIC mode: Flat. Using 2 I/O APICs (XEN) ACPI: HPET id: 0x8086a701 base: 0xfed00000 (XEN) [VT-D]Host address width 46 (XEN) [VT-D]found ACPI_DMAR_DRHD: (XEN) [VT-D] dmaru->address = fbffe000 (XEN) [VT-D]drhd->address = fbffe000 iommu->reg = ffff82c00021b000 (XEN) [VT-D]cap = d2078c106f0462 ecap = f020fa (XEN) [VT-D] IOAPIC: 0000:00:1f.7 (XEN) [VT-D] IOAPIC: 0000:00:05.4 (XEN) [VT-D] MSI HPET: 0000:f0:0f.0 (XEN) [VT-D] flags: INCLUDE_ALL (XEN) [VT-D]found ACPI_DMAR_RMRR: (XEN) [VT-D] endpoint: 0000:00:1d.0 (XEN) [VT-D] endpoint: 0000:00:1a.0 (XEN) [VT-D]dmar.c:638: RMRR region: base_addr ac7cf000 end_addr ac7defff (XEN) [VT-D]found ACPI_DMAR_RHSA: (XEN) [VT-D] rhsau->address: fbffe000 rhsau->proximity_domain: 0 (XEN) ERST table was not found (XEN) Using ACPI (MADT) for SMP configuration information (XEN) SMP: Allowing 32 CPUs (28 hotplug CPUs) (XEN) IRQ limits: 48 GSI, 736 MSI/MSI-X (XEN) Switched to APIC driver x2apic_cluster. (XEN) xstate: size: 0x340 and states: 0x7 (XEN) mce_intel.c:763: MCA Capability: firstbank 0, extended MCE MSR 0, BCAST, SER, CMCI (XEN) CPU0: Intel machine check reporting enabled (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) Platform timer is 14.318MHz HPET (XEN) Detected 2992.801 MHz processor. (XEN) Initing memory sharing. (XEN) alt table ffff82d0804262c0 -> ffff82d080426a34 (XEN) PCI: MCFG configuration 0: base b0000000 segment 0000 buses 00 - 3f (XEN) PCI: MCFG area at b0000000 reserved in E820 (XEN) PCI: Using MCFG for segment 0000 bus 00-3f (XEN) Intel VT-d iommu 0 supported page sizes: 4kB, 2MB, 1GB. (XEN) Intel VT-d Snoop Control enabled. (XEN) Intel VT-d Dom0 DMA Passthrough not enabled. (XEN) Intel VT-d Queued Invalidation enabled. (XEN) Intel VT-d Interrupt Remapping enabled. (XEN) Intel VT-d Posted Interrupt not enabled. (XEN) Intel VT-d Shared EPT tables enabled. (XEN) I/O virtualisation enabled (XEN) - Dom0 mode: Relaxed (XEN) Interrupt remapping enabled (XEN) nr_sockets: 9 (XEN) Enabled directed EOI with ioapic_ack_old on! (XEN) ENABLING IO-APIC IRQs (XEN) -> Using old ACK method (XEN) ..TIMER: vector=0xF0 apic1=0 pin1=2 apic2=-1 pin2=-1 (XEN) TSC deadline timer enabled (XEN) Allocated console ring of 32 KiB. (XEN) mwait-idle: MWAIT substates: 0x21120 (XEN) mwait-idle: v0.4.1 model 0x2d (XEN) mwait-idle: lapic_timer_reliable_states 0xffffffff (XEN) VMX: Supported advanced features: (XEN) - APIC MMIO access virtualisation (XEN) - APIC TPR shadow (XEN) - Extended Page Tables (EPT) (XEN) - Virtual-Processor Identifiers (VPID) (XEN) - Virtual NMI (XEN) - MSR direct-access bitmap (XEN) - Unrestricted Guest (XEN) HVM: ASIDs enabled. (XEN) HVM: VMX enabled (XEN) HVM: Hardware Assisted Paging (HAP) detected (XEN) HVM: HAP page sizes: 4kB, 2MB, 1GB (XEN) Brought up 4 CPUs (XEN) Testing NMI watchdog on all CPUs: ok (XEN) Running stub recovery selftests... (XEN) traps.c:1530: GPF (0000): ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d08034b0fa (XEN) traps.c:738: Trap 12: ffff82d0bffff040 [ffff82d0bffff040] -> ffff82d08034b0fa (XEN) traps.c:1068: Trap 3: ffff82d0bffff041 [ffff82d0bffff041] -> ffff82d08034b0fa (XEN) ACPI sleep modes: S3 (XEN) VPMU: disabled (XEN) mcheck_poll: Machine check polling timer started. (XEN) Dom0 has maximum 624 PIRQs (XEN) NX (Execute Disable) protection active (XEN) *** LOADING DOMAIN 0 *** (XEN) ELF: phdr: paddr=0x200000 memsz=0x14d0dc8 (XEN) ELF: phdr: paddr=0x18d1000 memsz=0x6b05a8 (XEN) ELF: memory: 0x200000 -> 0x1f815a8 (XEN) ELF: note: GUEST_OS = "FreeBSD" (XEN) ELF: note: GUEST_VERSION = "0x124f9d" (XEN) ELF: note: XEN_VERSION = "xen-3.0" (XEN) ELF: note: VIRT_BASE = 0xffffffff80000000 (XEN) ELF: note: PADDR_OFFSET = 0 (XEN) ELF: note: ENTRY = 0xffffffff80eba000 (XEN) ELF: note: HYPERCALL_PAGE = 0xffffffff80eb9000 (XEN) ELF: note: HV_START_LOW = 0xffff800000000000 (XEN) ELF: note: FEATURES = "writable_descriptor_tables|auto_translated_physmap|supervisor_mode_kernel|hvm_callback_vector" (XEN) ELF: note: PAE_MODE = "yes" (XEN) ELF: note: unknown (0xd) (XEN) ELF: note: LOADER = "generic" (XEN) ELF: note: SUSPEND_CANCEL = 0 (XEN) ELF: note: BSD_SYMTAB = "yes" (XEN) ELF: note: PHYS32_ENTRY = 0xeba030 (XEN) ELF: using notes from SHT_NOTE section (XEN) ELF: addresses: (XEN) virt_base = 0xffffffff80000000 (XEN) elf_paddr_offset = 0x0 (XEN) virt_offset = 0xffffffff80000000 (XEN) virt_kstart = 0xffffffff80200000 (XEN) virt_kend = 0xffffffff82272768 (XEN) virt_entry = 0xffffffff80eba000 (XEN) p2m_base = 0xffffffffffffffff (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 64-bit, PAE, lsb, paddr 0x200000 -> 0x1f815a8 (XEN) Dom0 symbol map 0x1f815a8 -> 0x2272768 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 0000000244000000->0000000248000000 (1031217 pages to be allocated) (XEN) Init. ramdisk: 000000024fc31000->0000000250000000 (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: ffffffff80200000->ffffffff82272768 (XEN) Init. ramdisk: ffffffff82273000->ffffffff82642000 (XEN) Phys-Mach map: ffffffff82642000->ffffffff82e42000 (XEN) Start info: ffffffff82e42000->ffffffff82e424b4 (XEN) Page tables: ffffffff82e43000->ffffffff82e5e000 (XEN) Boot stack: ffffffff82e5e000->ffffffff82e5f000 (XEN) TOTAL: ffffffff80000000->ffffffff83000000 (XEN) ENTRY ADDRESS: ffffffff80eba000 (XEN) Dom0 has maximum 4 VCPUs (XEN) ELF: phdr 2 at 0xffffffff80200000 -> 0xffffffff816d0dc8 (XEN) ELF: phdr 3 at 0xffffffff818d1000 -> 0xffffffff81a1a7b8 (XEN) [VT-D]d0:Hostbridge: skip 0000:00:00.0 map (XEN) Masked UR signaling on 0000:00:00.0 (XEN) Masked UR signaling on 0000:00:01.0 (XEN) Masked UR signaling on 0000:00:01.1 (XEN) Masked UR signaling on 0000:00:02.0 (XEN) Masked UR signaling on 0000:00:03.0 (XEN) [VT-D]d0:PCIe: map 0000:00:05.0 (XEN) Masked VT-d error signaling on 0000:00:05.0 (XEN) [VT-D]d0:PCIe: map 0000:00:05.2 (XEN) [VT-D]d0:PCI: map 0000:00:05.4 (XEN) [VT-D]d0:PCI: map 0000:00:16.0 (XEN) [VT-D]d0:PCI: map 0000:00:19.0 (XEN) [VT-D]d0:PCI: map 0000:00:1a.0 (XEN) [VT-D]d0:PCIe: map 0000:00:1b.0 (XEN) [VT-D]d0:PCI: map 0000:00:1d.0 (XEN) [VT-D]d0:PCI: map 0000:00:1f.0 (XEN) [VT-D]d0:PCI: map 0000:00:1f.2 (XEN) [VT-D]d0:PCI: map 0000:00:1f.3 (XEN) [VT-D]d0:PCIe: map 0000:03:00.0 (XEN) [VT-D]d0:PCIe: map 0000:03:00.1 (XEN) [VT-D]d0:PCIe: map 0000:05:00.0 (XEN) [VT-D]d0:PCIe: map 0000:05:00.3 (XEN) [VT-D]d0:PCIe: map 0000:07:00.0 (XEN) [VT-D]d0:PCI: map 0000:3f:08.0 (XEN) [VT-D]d0:PCIe: map 0000:3f:08.3 (XEN) [VT-D]d0:PCIe: map 0000:3f:08.4 (XEN) [VT-D]d0:PCI: map 0000:3f:09.0 (XEN) [VT-D]d0:PCIe: map 0000:3f:09.3 (XEN) [VT-D]d0:PCIe: map 0000:3f:09.4 (XEN) [VT-D]d0:PCI: map 0000:3f:0a.0 (XEN) [VT-D]d0:PCI: map 0000:3f:0a.1 (XEN) [VT-D]d0:PCI: map 0000:3f:0a.2 (XEN) [VT-D]d0:PCI: map 0000:3f:0a.3 (XEN) [VT-D]d0:PCI: map 0000:3f:0b.0 (XEN) [VT-D]d0:PCI: map 0000:3f:0b.3 (XEN) [VT-D]d0:PCI: map 0000:3f:0c.0 (XEN) [VT-D]d0:PCI: map 0000:3f:0c.1 (XEN) [VT-D]d0:PCI: map 0000:3f:0c.6 (XEN) [VT-D]d0:PCI: map 0000:3f:0c.7 (XEN) [VT-D]d0:PCI: map 0000:3f:0d.0 (XEN) [VT-D]d0:PCI: map 0000:3f:0d.1 (XEN) [VT-D]d0:PCI: map 0000:3f:0d.6 (XEN) [VT-D]d0:PCI: map 0000:3f:0e.0 (XEN) [VT-D]d0:PCI: map 0000:3f:0e.1 (XEN) [VT-D]d0:PCIe: map 0000:3f:0f.0 (XEN) [VT-D]d0:PCIe: map 0000:3f:0f.1 (XEN) [VT-D]d0:PCIe: map 0000:3f:0f.2 (XEN) [VT-D]d0:PCIe: map 0000:3f:0f.3 (XEN) [VT-D]d0:PCIe: map 0000:3f:0f.4 (XEN) [VT-D]d0:PCIe: map 0000:3f:0f.5 (XEN) [VT-D]d0:PCI: map 0000:3f:0f.6 (XEN) [VT-D]d0:PCIe: map 0000:3f:10.0 (XEN) [VT-D]d0:PCIe: map 0000:3f:10.1 (XEN) [VT-D]d0:PCIe: map 0000:3f:10.2 (XEN) [VT-D]d0:PCIe: map 0000:3f:10.3 (XEN) [VT-D]d0:PCIe: map 0000:3f:10.4 (XEN) [VT-D]d0:PCIe: map 0000:3f:10.5 (XEN) [VT-D]d0:PCIe: map 0000:3f:10.6 (XEN) [VT-D]d0:PCIe: map 0000:3f:10.7 (XEN) [VT-D]d0:PCI: map 0000:3f:11.0 (XEN) [VT-D]d0:PCI: map 0000:3f:13.0 (XEN) [VT-D]d0:PCI: map 0000:3f:13.1 (XEN) [VT-D]d0:PCI: map 0000:3f:13.4 (XEN) [VT-D]d0:PCI: map 0000:3f:13.5 (XEN) [VT-D]d0:PCI: map 0000:3f:13.6 (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000
On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote: > On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote: > >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote: > >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com] > >> > Sent: Friday, August 25, 2017 9:59 PM > >> > > >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote: > >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote: > >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: > >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: > >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: > >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: > >> > > >> >> > --- a/xen/arch/x86/dom0_build.c > >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c > >> > > >> >> > @@ -440,6 +440,10 @@ int __init > >> > dom0_setup_permissions(struct domain *d) > >> > > >> >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > >> > > >> >> > } > >> > > >> >> > > >> > > >> >> > + /* For PVH prevent access to the MMCFG areas. */ > >> > > >> >> > + if ( dom0_pvh ) > >> > > >> >> > + rc |= pci_mmcfg_set_domain_permissions(d); > >> > > >> >> > >> > > >> >> What about ones reported by Dom0 later on? Which then raises the > >> > > >> >> question whether ... > >> > > >> > > >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved > >> > handler. > >> > > >> > But since you propose to do white listing, I guess it doesn't matter > >> > > >> > that much anymore. > >> > > >> > >> > > >> Well, a fundamental question is whether white listing would work in > >> > > >> the first place. I could see room for severe problems e.g. with ACPI > >> > > >> methods wanting to access MMIO that's not described by any PCI > >> > > >> devices' BARs. Typically that would be regions in the chipset which > >> > > >> firmware is responsible for configuring/managing, the addresses of > >> > > >> which can be found/set in custom config space registers. > >> > > > > >> > > > The question would also be what would Xen allow in such white-listing. > >> > > > Obviously you can get to map the same using both white-list and > >> > > > black-listing (see below). > >> > > > >> > > Not really - what you've said there regarding MMCFG regions is > >> > > a clear indication that we should _not_ map reserved regions, i.e. > >> > > it would need to be full white listing with perhaps just the PCI > >> > > device BARs being handled automatically. > >> > > >> > I've tried just mapping the BARs and that sadly doesn't work, the box > >> > hangs after the IOMMU is enabled: > >> > > >> > [...] > >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5 > >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6 > >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 > >> > > >> > I will park this ATM and leave it for the Intel guys to diagnose. > >> > > >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU > >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset. > >> > > >> > >> +Chao who can help check whether we have such a box at hand. > >> > >> btw please also give your BIOS version. > > > >It's a Precision T3600 BIOS A14. > > Hi, Roger. > > I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and The ones I've seen issues with are Sandy Bridge or Nehalem, can you find some of this hardware? I haven't tested Ivy Bridge, but all Haswell boxes I've tested seem to work just fine. Thanks, Roger.
On Thu, Aug 31, 2017 at 04:45:23PM +0800, Chao Gao wrote: > On Thu, Aug 31, 2017 at 10:03:19AM +0100, Roger Pau Monne wrote: > >On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote: > >> On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote: > >> >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote: > >> >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com] > >> >> > Sent: Friday, August 25, 2017 9:59 PM > >> >> > > >> >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote: > >> >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote: > >> >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: > >> >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: > >> >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: > >> >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: > >> >> > > >> >> > --- a/xen/arch/x86/dom0_build.c > >> >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c > >> >> > > >> >> > @@ -440,6 +440,10 @@ int __init > >> >> > dom0_setup_permissions(struct domain *d) > >> >> > > >> >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); > >> >> > > >> >> > } > >> >> > > >> >> > > >> >> > > >> >> > + /* For PVH prevent access to the MMCFG areas. */ > >> >> > > >> >> > + if ( dom0_pvh ) > >> >> > > >> >> > + rc |= pci_mmcfg_set_domain_permissions(d); > >> >> > > >> >> > >> >> > > >> >> What about ones reported by Dom0 later on? Which then raises the > >> >> > > >> >> question whether ... > >> >> > > >> > > >> >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved > >> >> > handler. > >> >> > > >> > But since you propose to do white listing, I guess it doesn't matter > >> >> > > >> > that much anymore. > >> >> > > >> > >> >> > > >> Well, a fundamental question is whether white listing would work in > >> >> > > >> the first place. I could see room for severe problems e.g. with ACPI > >> >> > > >> methods wanting to access MMIO that's not described by any PCI > >> >> > > >> devices' BARs. Typically that would be regions in the chipset which > >> >> > > >> firmware is responsible for configuring/managing, the addresses of > >> >> > > >> which can be found/set in custom config space registers. > >> >> > > > > >> >> > > > The question would also be what would Xen allow in such white-listing. > >> >> > > > Obviously you can get to map the same using both white-list and > >> >> > > > black-listing (see below). > >> >> > > > >> >> > > Not really - what you've said there regarding MMCFG regions is > >> >> > > a clear indication that we should _not_ map reserved regions, i.e. > >> >> > > it would need to be full white listing with perhaps just the PCI > >> >> > > device BARs being handled automatically. > >> >> > > >> >> > I've tried just mapping the BARs and that sadly doesn't work, the box > >> >> > hangs after the IOMMU is enabled: > >> >> > > >> >> > [...] > >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5 > >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6 > >> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 > >> >> > > >> >> > I will park this ATM and leave it for the Intel guys to diagnose. > >> >> > > >> >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU > >> >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset. > >> >> > > >> >> > >> >> +Chao who can help check whether we have such a box at hand. > >> >> > >> >> btw please also give your BIOS version. > >> > > >> >It's a Precision T3600 BIOS A14. > >> > >> Hi, Roger. > >> > >> I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and > > > >The ones I've seen issues with are Sandy Bridge or Nehalem, can you > >find some of this hardware? > > As I expected, I was removed from recipents :(, which made me > hard to notice your replies in time. Sorry, I have no idea why my MUA does that, it seems to be able to deal fine with other recipients. > Yes. I will. But may take some time (for even Ivy Bridge is rare). > > > > >I haven't tested Ivy Bridge, but all Haswell boxes I've tested seem to > >work just fine. > > The reason why I chose Ivy Bridge partly is you said you found this bug on > almost pre-haswell box. I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge (in fact I didn't even know about Ivy Bridge, that's why I said all pre-Haswell). In fact I'm now trying with a Nehalem processor that seem to work, so whatever this issue is it certainly doesn't affect all models or chipsets. Thanks, Roger.
On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote: >On Thu, Aug 31, 2017 at 04:45:23PM +0800, Chao Gao wrote: >> On Thu, Aug 31, 2017 at 10:03:19AM +0100, Roger Pau Monne wrote: >> >On Thu, Aug 31, 2017 at 03:32:42PM +0800, Chao Gao wrote: >> >> On Tue, Aug 29, 2017 at 08:33:25AM +0100, Roger Pau Monne wrote: >> >> >On Mon, Aug 28, 2017 at 06:18:13AM +0000, Tian, Kevin wrote: >> >> >> > From: Roger Pau Monne [mailto:roger.pau@citrix.com] >> >> >> > Sent: Friday, August 25, 2017 9:59 PM >> >> >> > >> >> >> > On Fri, Aug 25, 2017 at 06:25:36AM -0600, Jan Beulich wrote: >> >> >> > > >>> On 25.08.17 at 14:15, <roger.pau@citrix.com> wrote: >> >> >> > > > On Wed, Aug 23, 2017 at 02:16:38AM -0600, Jan Beulich wrote: >> >> >> > > >> >>> On 22.08.17 at 15:54, <roger.pau@citrix.com> wrote: >> >> >> > > >> > On Tue, Aug 22, 2017 at 06:26:23AM -0600, Jan Beulich wrote: >> >> >> > > >> >> >>> On 11.08.17 at 18:43, <roger.pau@citrix.com> wrote: >> >> >> > > >> >> > --- a/xen/arch/x86/dom0_build.c >> >> >> > > >> >> > +++ b/xen/arch/x86/dom0_build.c >> >> >> > > >> >> > @@ -440,6 +440,10 @@ int __init >> >> >> > dom0_setup_permissions(struct domain *d) >> >> >> > > >> >> > rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); >> >> >> > > >> >> > } >> >> >> > > >> >> > >> >> >> > > >> >> > + /* For PVH prevent access to the MMCFG areas. */ >> >> >> > > >> >> > + if ( dom0_pvh ) >> >> >> > > >> >> > + rc |= pci_mmcfg_set_domain_permissions(d); >> >> >> > > >> >> >> >> >> > > >> >> What about ones reported by Dom0 later on? Which then raises the >> >> >> > > >> >> question whether ... >> >> >> > > >> > >> >> >> > > >> > This should be dealt with in the PHYSDEVOP_pci_mmcfg_reserved >> >> >> > handler. >> >> >> > > >> > But since you propose to do white listing, I guess it doesn't matter >> >> >> > > >> > that much anymore. >> >> >> > > >> >> >> >> > > >> Well, a fundamental question is whether white listing would work in >> >> >> > > >> the first place. I could see room for severe problems e.g. with ACPI >> >> >> > > >> methods wanting to access MMIO that's not described by any PCI >> >> >> > > >> devices' BARs. Typically that would be regions in the chipset which >> >> >> > > >> firmware is responsible for configuring/managing, the addresses of >> >> >> > > >> which can be found/set in custom config space registers. >> >> >> > > > >> >> >> > > > The question would also be what would Xen allow in such white-listing. >> >> >> > > > Obviously you can get to map the same using both white-list and >> >> >> > > > black-listing (see below). >> >> >> > > >> >> >> > > Not really - what you've said there regarding MMCFG regions is >> >> >> > > a clear indication that we should _not_ map reserved regions, i.e. >> >> >> > > it would need to be full white listing with perhaps just the PCI >> >> >> > > device BARs being handled automatically. >> >> >> > >> >> >> > I've tried just mapping the BARs and that sadly doesn't work, the box >> >> >> > hangs after the IOMMU is enabled: >> >> >> > >> >> >> > [...] >> >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.5 >> >> >> > (XEN) [VT-D]d0:PCI: map 0000:3f:13.6 >> >> >> > (XEN) [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 >> >> >> > >> >> >> > I will park this ATM and leave it for the Intel guys to diagnose. >> >> >> > >> >> >> > For the reference, the specific box I'm testing ATM has a Xeon(R) CPU >> >> >> > E5-1607 0 @ 3.00GHz and a C600/X79 chipset. >> >> >> > >> >> >> >> >> >> +Chao who can help check whether we have such a box at hand. >> >> >> >> >> >> btw please also give your BIOS version. >> >> > >> >> >It's a Precision T3600 BIOS A14. >> >> >> >> Hi, Roger. >> >> >> >> I found a Ivy bridge box with E5-2697 v2 and tested with "dom0=pvh", and >> > >> >The ones I've seen issues with are Sandy Bridge or Nehalem, can you >> >find some of this hardware? >> >> As I expected, I was removed from recipents :(, which made me >> hard to notice your replies in time. > >Sorry, I have no idea why my MUA does that, it seems to be able to >deal fine with other recipients. > >> Yes. I will. But may take some time (for even Ivy Bridge is rare). >> >> > >> >I haven't tested Ivy Bridge, but all Haswell boxes I've tested seem to >> >work just fine. >> >> The reason why I chose Ivy Bridge partly is you said you found this bug on >> almost pre-haswell box. > >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge >(in fact I didn't even know about Ivy Bridge, that's why I said all >pre-Haswell). > >In fact I'm now trying with a Nehalem processor that seem to work, so >whatever this issue is it certainly doesn't affect all models or >chipsets. Hi, Roger. Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang. I also tested on Haswell and found RMRRs in dmar are incorrect on my haswell. The e820 on that machine is: (XEN) [ 0.000000] Xen-e820 RAM map: (XEN) [ 0.000000] 0000000000000000 - 000000000009a400 (usable) (XEN) [ 0.000000] 000000000009a400 - 00000000000a0000 (reserved) (XEN) [ 0.000000] 00000000000e0000 - 0000000000100000 (reserved) (XEN) [ 0.000000] 0000000000100000 - 000000006ff84000 (usable) (XEN) [ 0.000000] 000000006ff84000 - 000000007ac51000 (reserved) (XEN) [ 0.000000] 000000007ac51000 - 000000007b681000 (ACPI NVS) (XEN) [ 0.000000] 000000007b681000 - 000000007b7cf000 (ACPI data) (XEN) [ 0.000000] 000000007b7cf000 - 000000007b800000 (usable) (XEN) [ 0.000000] 000000007b800000 - 0000000090000000 (reserved) (XEN) [ 0.000000] 00000000fed1c000 - 00000000fed20000 (reserved) (XEN) [ 0.000000] 00000000ff400000 - 0000000100000000 (reserved) (XEN) [ 0.000000] 0000000100000000 - 0000002080000000 (usable) And the RMRRs in DMAR are: (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: (XEN) [ 0.000000] [VT-D] endpoint: 0000:05:00.0 (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723b4000 end_addr 7a3f3fff (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1d.0 (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1a.0 (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723ac000 end_addr 723aefff (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0 are USB controllers.) After DMA remapping is enabled, two DMA translation faults are reported by VT-d: (XEN) [ 9.547924] [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021b000 (XEN) [ 9.550620] [VT-D]iommu_enable_translation: iommu->reg = ffff82c00021d000 (XEN) [ 9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary Pending Fault (XEN) [ 9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0] fault addr 7a3f5000, iommu reg = ffff82c00021d000 (XEN) [ 9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [ 9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn 7a3f5 (XEN) [ 9.561179] root_entry[00] = 107277c001 (XEN) [ 9.562447] context[d0] = 2_1072c06001 (XEN) [ 9.563776] l4[000] = 9c0000202f171107 (XEN) [ 9.565125] l3[001] = 9c0000202f152107 (XEN) [ 9.566483] l2[1d1] = 9c000010727ce107 (XEN) [ 9.567821] l1[1f5] = 8000000000000000 (XEN) [ 9.569168] l1[1f5] not present (XEN) [ 9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0] fault addr 7a3f4000, iommu reg = ffff82c00021d000 (XEN) [ 9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set (XEN) [ 9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn 7a3f4 (XEN) [ 9.575819] root_entry[00] = 107277c001 (XEN) [ 9.577129] context[e8] = 2_1072c06001 (XEN) [ 9.578439] l4[000] = 9c0000202f171107 (XEN) [ 9.579778] l3[001] = 9c0000202f152107 (XEN) [ 9.581111] l2[1d1] = 9c000010727ce107 (XEN) [ 9.582482] l1[1f4] = 8000000000000000 (XEN) [ 9.583812] l1[1f4] not present (XEN) [ 10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address (XEN) [ 10.521499] Failed to load Dom0 kernel (XEN) [ 10.532171] (XEN) [ 10.535464] **************************************** (XEN) [ 10.542636] Panic on CPU 0: (XEN) [ 10.547394] Could not set up DOM0 guest OS (XEN) [ 10.553605] **************************************** The fault address the devices failed to access is marked as reserved in e820 and isn't reserved for the devices according to the RMRRs in DMAR. So I think we can draw a conclusion that some existing BIOSs don't expose correct RMRR to OS by DMAR. And we need a workaround such as iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0 and pvh dom0. As to the machine hang Roger observed, I have no idea on the cause. Roger, have you ever seen the VT-d on that machine reporting a DMA translation fault? If not, can you create one fault in native? I think this can tell us whether the hardware's fault report function works well or there are some bugs in Xen code. What is your opinion on this trial? Thanks chao
On Mon, Sep 04, 2017 at 10:26:04AM +0100, Roger Pau Monné wrote: >(Adding Chao again because my MUA seems to drop him each time) > >On Mon, Sep 04, 2017 at 10:00:00AM +0100, Roger Pau Monné wrote: >> On Mon, Sep 04, 2017 at 02:25:10PM +0800, Chao Gao wrote: >> > On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote: >> > >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge >> > >(in fact I didn't even know about Ivy Bridge, that's why I said all >> > >pre-Haswell). >> > > >> > >In fact I'm now trying with a Nehalem processor that seem to work, so >> > >whatever this issue is it certainly doesn't affect all models or >> > >chipsets. >> > >> > Hi, Roger. >> > >> > Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690 >> > 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang. >> > >> > I also tested on Haswell and found RMRRs in dmar are incorrect on my >> > haswell. The e820 on that machine is: >> > (XEN) [ 0.000000] Xen-e820 RAM map: >> > (XEN) [ 0.000000] 0000000000000000 - 000000000009a400 (usable) >> > (XEN) [ 0.000000] 000000000009a400 - 00000000000a0000 (reserved) >> > (XEN) [ 0.000000] 00000000000e0000 - 0000000000100000 (reserved) >> > (XEN) [ 0.000000] 0000000000100000 - 000000006ff84000 (usable) >> > (XEN) [ 0.000000] 000000006ff84000 - 000000007ac51000 (reserved) >> > (XEN) [ 0.000000] 000000007ac51000 - 000000007b681000 (ACPI NVS) >> > (XEN) [ 0.000000] 000000007b681000 - 000000007b7cf000 (ACPI data) >> > (XEN) [ 0.000000] 000000007b7cf000 - 000000007b800000 (usable) >> > (XEN) [ 0.000000] 000000007b800000 - 0000000090000000 (reserved) >> > (XEN) [ 0.000000] 00000000fed1c000 - 00000000fed20000 (reserved) >> > (XEN) [ 0.000000] 00000000ff400000 - 0000000100000000 (reserved) >> > (XEN) [ 0.000000] 0000000100000000 - 0000002080000000 (usable) >> > >> > And the RMRRs in DMAR are: >> > (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: >> > (XEN) [ 0.000000] [VT-D] endpoint: 0000:05:00.0 >> > (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723b4000 >> > end_addr 7a3f3fff >> > (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: >> > (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1d.0 >> > (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1a.0 >> > (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723ac000 >> > end_addr 723aefff >> > (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0 >> > are USB controllers.) >> > >> > After DMA remapping is enabled, two DMA translation faults are reported >> > by VT-d: >> > (XEN) [ 9.547924] [VT-D]iommu_enable_translation: iommu->reg = >> > ffff82c00021b000 >> > (XEN) [ 9.550620] [VT-D]iommu_enable_translation: iommu->reg = >> > ffff82c00021d000 >> > (XEN) [ 9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary >> > Pending Fault >> > (XEN) [ 9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0] >> > fault addr 7a3f5000, iommu reg = ffff82c00021d000 >> > (XEN) [ 9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set >> > (XEN) [ 9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn >> > 7a3f5 >> > (XEN) [ 9.561179] root_entry[00] = 107277c001 >> > (XEN) [ 9.562447] context[d0] = 2_1072c06001 >> > (XEN) [ 9.563776] l4[000] = 9c0000202f171107 >> > (XEN) [ 9.565125] l3[001] = 9c0000202f152107 >> > (XEN) [ 9.566483] l2[1d1] = 9c000010727ce107 >> > (XEN) [ 9.567821] l1[1f5] = 8000000000000000 >> > (XEN) [ 9.569168] l1[1f5] not present >> > (XEN) [ 9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0] >> > fault addr 7a3f4000, iommu reg = ffff82c00021d000 >> > (XEN) [ 9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set >> > (XEN) [ 9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn >> > 7a3f4 >> > (XEN) [ 9.575819] root_entry[00] = 107277c001 >> > (XEN) [ 9.577129] context[e8] = 2_1072c06001 >> > (XEN) [ 9.578439] l4[000] = 9c0000202f171107 >> > (XEN) [ 9.579778] l3[001] = 9c0000202f152107 >> > (XEN) [ 9.581111] l2[1d1] = 9c000010727ce107 >> > (XEN) [ 9.582482] l1[1f4] = 8000000000000000 >> > (XEN) [ 9.583812] l1[1f4] not present >> > (XEN) [ 10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address >> > (XEN) [ 10.521499] Failed to load Dom0 kernel >> > (XEN) [ 10.532171] >> > (XEN) [ 10.535464] **************************************** >> > (XEN) [ 10.542636] Panic on CPU 0: >> > (XEN) [ 10.547394] Could not set up DOM0 guest OS >> > (XEN) [ 10.553605] **************************************** >> > >> > The fault address the devices failed to access is marked as reserved in >> > e820 and isn't reserved for the devices according to the RMRRs in DMAR. >> > So I think we can draw a conclusion that some existing BIOSs don't >> > expose correct RMRR to OS by DMAR. And we need a workaround such as >> > iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0 >> > and pvh dom0. >> >> So your box seems to be capable of generating faults. Missing RMRR >> regions is (sadly) expected, but at least you get faults and not a >> complete hang. Which chipset does this box have? Is it a C600/X79? No. The haswell's chipset is C610/x99. >> >> > >> > As to the machine hang Roger observed, I have no idea on the cause. Roger, >> > have you ever seen the VT-d on that machine reporting a DMA >> > translation fault? If not, can you create one fault in native? I think >> > this can tell us whether the hardware's fault report function works well >> > or there are some bugs in Xen code. What is your opinion on this trial? >> >> Is there any simple way to create such a fault? Does the IOMMU have >> some kind of self-test thing that can be used to generate a synthetic >> fault? I don't know such tool. Maybe you can hack the iommu driver. Thanks Chao
On Mon, Sep 04, 2017 at 02:25:10PM +0800, Chao Gao wrote: > On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote: > >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge > >(in fact I didn't even know about Ivy Bridge, that's why I said all > >pre-Haswell). > > > >In fact I'm now trying with a Nehalem processor that seem to work, so > >whatever this issue is it certainly doesn't affect all models or > >chipsets. > > Hi, Roger. > > Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690 > 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang. > > I also tested on Haswell and found RMRRs in dmar are incorrect on my > haswell. The e820 on that machine is: > (XEN) [ 0.000000] Xen-e820 RAM map: > (XEN) [ 0.000000] 0000000000000000 - 000000000009a400 (usable) > (XEN) [ 0.000000] 000000000009a400 - 00000000000a0000 (reserved) > (XEN) [ 0.000000] 00000000000e0000 - 0000000000100000 (reserved) > (XEN) [ 0.000000] 0000000000100000 - 000000006ff84000 (usable) > (XEN) [ 0.000000] 000000006ff84000 - 000000007ac51000 (reserved) > (XEN) [ 0.000000] 000000007ac51000 - 000000007b681000 (ACPI NVS) > (XEN) [ 0.000000] 000000007b681000 - 000000007b7cf000 (ACPI data) > (XEN) [ 0.000000] 000000007b7cf000 - 000000007b800000 (usable) > (XEN) [ 0.000000] 000000007b800000 - 0000000090000000 (reserved) > (XEN) [ 0.000000] 00000000fed1c000 - 00000000fed20000 (reserved) > (XEN) [ 0.000000] 00000000ff400000 - 0000000100000000 (reserved) > (XEN) [ 0.000000] 0000000100000000 - 0000002080000000 (usable) > > And the RMRRs in DMAR are: > (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: > (XEN) [ 0.000000] [VT-D] endpoint: 0000:05:00.0 > (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723b4000 > end_addr 7a3f3fff > (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: > (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1d.0 > (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1a.0 > (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723ac000 > end_addr 723aefff > (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0 > are USB controllers.) > > After DMA remapping is enabled, two DMA translation faults are reported > by VT-d: > (XEN) [ 9.547924] [VT-D]iommu_enable_translation: iommu->reg = > ffff82c00021b000 > (XEN) [ 9.550620] [VT-D]iommu_enable_translation: iommu->reg = > ffff82c00021d000 > (XEN) [ 9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary > Pending Fault > (XEN) [ 9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0] > fault addr 7a3f5000, iommu reg = ffff82c00021d000 > (XEN) [ 9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set > (XEN) [ 9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn > 7a3f5 > (XEN) [ 9.561179] root_entry[00] = 107277c001 > (XEN) [ 9.562447] context[d0] = 2_1072c06001 > (XEN) [ 9.563776] l4[000] = 9c0000202f171107 > (XEN) [ 9.565125] l3[001] = 9c0000202f152107 > (XEN) [ 9.566483] l2[1d1] = 9c000010727ce107 > (XEN) [ 9.567821] l1[1f5] = 8000000000000000 > (XEN) [ 9.569168] l1[1f5] not present > (XEN) [ 9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0] > fault addr 7a3f4000, iommu reg = ffff82c00021d000 > (XEN) [ 9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set > (XEN) [ 9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn > 7a3f4 > (XEN) [ 9.575819] root_entry[00] = 107277c001 > (XEN) [ 9.577129] context[e8] = 2_1072c06001 > (XEN) [ 9.578439] l4[000] = 9c0000202f171107 > (XEN) [ 9.579778] l3[001] = 9c0000202f152107 > (XEN) [ 9.581111] l2[1d1] = 9c000010727ce107 > (XEN) [ 9.582482] l1[1f4] = 8000000000000000 > (XEN) [ 9.583812] l1[1f4] not present > (XEN) [ 10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address > (XEN) [ 10.521499] Failed to load Dom0 kernel > (XEN) [ 10.532171] > (XEN) [ 10.535464] **************************************** > (XEN) [ 10.542636] Panic on CPU 0: > (XEN) [ 10.547394] Could not set up DOM0 guest OS > (XEN) [ 10.553605] **************************************** > > The fault address the devices failed to access is marked as reserved in > e820 and isn't reserved for the devices according to the RMRRs in DMAR. > So I think we can draw a conclusion that some existing BIOSs don't > expose correct RMRR to OS by DMAR. And we need a workaround such as > iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0 > and pvh dom0. So your box seems to be capable of generating faults. Missing RMRR regions is (sadly) expected, but at least you get faults and not a complete hang. Which chipset does this box have? Is it a C600/X79? > > As to the machine hang Roger observed, I have no idea on the cause. Roger, > have you ever seen the VT-d on that machine reporting a DMA > translation fault? If not, can you create one fault in native? I think > this can tell us whether the hardware's fault report function works well > or there are some bugs in Xen code. What is your opinion on this trial? Is there any simple way to create such a fault? Does the IOMMU have some kind of self-test thing that can be used to generate a synthetic fault? Thanks, Roger.
(Adding Chao again because my MUA seems to drop him each time) On Mon, Sep 04, 2017 at 10:00:00AM +0100, Roger Pau Monné wrote: > On Mon, Sep 04, 2017 at 02:25:10PM +0800, Chao Gao wrote: > > On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote: > > >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge > > >(in fact I didn't even know about Ivy Bridge, that's why I said all > > >pre-Haswell). > > > > > >In fact I'm now trying with a Nehalem processor that seem to work, so > > >whatever this issue is it certainly doesn't affect all models or > > >chipsets. > > > > Hi, Roger. > > > > Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690 > > 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang. > > > > I also tested on Haswell and found RMRRs in dmar are incorrect on my > > haswell. The e820 on that machine is: > > (XEN) [ 0.000000] Xen-e820 RAM map: > > (XEN) [ 0.000000] 0000000000000000 - 000000000009a400 (usable) > > (XEN) [ 0.000000] 000000000009a400 - 00000000000a0000 (reserved) > > (XEN) [ 0.000000] 00000000000e0000 - 0000000000100000 (reserved) > > (XEN) [ 0.000000] 0000000000100000 - 000000006ff84000 (usable) > > (XEN) [ 0.000000] 000000006ff84000 - 000000007ac51000 (reserved) > > (XEN) [ 0.000000] 000000007ac51000 - 000000007b681000 (ACPI NVS) > > (XEN) [ 0.000000] 000000007b681000 - 000000007b7cf000 (ACPI data) > > (XEN) [ 0.000000] 000000007b7cf000 - 000000007b800000 (usable) > > (XEN) [ 0.000000] 000000007b800000 - 0000000090000000 (reserved) > > (XEN) [ 0.000000] 00000000fed1c000 - 00000000fed20000 (reserved) > > (XEN) [ 0.000000] 00000000ff400000 - 0000000100000000 (reserved) > > (XEN) [ 0.000000] 0000000100000000 - 0000002080000000 (usable) > > > > And the RMRRs in DMAR are: > > (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: > > (XEN) [ 0.000000] [VT-D] endpoint: 0000:05:00.0 > > (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723b4000 > > end_addr 7a3f3fff > > (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: > > (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1d.0 > > (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1a.0 > > (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723ac000 > > end_addr 723aefff > > (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0 > > are USB controllers.) > > > > After DMA remapping is enabled, two DMA translation faults are reported > > by VT-d: > > (XEN) [ 9.547924] [VT-D]iommu_enable_translation: iommu->reg = > > ffff82c00021b000 > > (XEN) [ 9.550620] [VT-D]iommu_enable_translation: iommu->reg = > > ffff82c00021d000 > > (XEN) [ 9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary > > Pending Fault > > (XEN) [ 9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0] > > fault addr 7a3f5000, iommu reg = ffff82c00021d000 > > (XEN) [ 9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set > > (XEN) [ 9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn > > 7a3f5 > > (XEN) [ 9.561179] root_entry[00] = 107277c001 > > (XEN) [ 9.562447] context[d0] = 2_1072c06001 > > (XEN) [ 9.563776] l4[000] = 9c0000202f171107 > > (XEN) [ 9.565125] l3[001] = 9c0000202f152107 > > (XEN) [ 9.566483] l2[1d1] = 9c000010727ce107 > > (XEN) [ 9.567821] l1[1f5] = 8000000000000000 > > (XEN) [ 9.569168] l1[1f5] not present > > (XEN) [ 9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0] > > fault addr 7a3f4000, iommu reg = ffff82c00021d000 > > (XEN) [ 9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set > > (XEN) [ 9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn > > 7a3f4 > > (XEN) [ 9.575819] root_entry[00] = 107277c001 > > (XEN) [ 9.577129] context[e8] = 2_1072c06001 > > (XEN) [ 9.578439] l4[000] = 9c0000202f171107 > > (XEN) [ 9.579778] l3[001] = 9c0000202f152107 > > (XEN) [ 9.581111] l2[1d1] = 9c000010727ce107 > > (XEN) [ 9.582482] l1[1f4] = 8000000000000000 > > (XEN) [ 9.583812] l1[1f4] not present > > (XEN) [ 10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address > > (XEN) [ 10.521499] Failed to load Dom0 kernel > > (XEN) [ 10.532171] > > (XEN) [ 10.535464] **************************************** > > (XEN) [ 10.542636] Panic on CPU 0: > > (XEN) [ 10.547394] Could not set up DOM0 guest OS > > (XEN) [ 10.553605] **************************************** > > > > The fault address the devices failed to access is marked as reserved in > > e820 and isn't reserved for the devices according to the RMRRs in DMAR. > > So I think we can draw a conclusion that some existing BIOSs don't > > expose correct RMRR to OS by DMAR. And we need a workaround such as > > iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0 > > and pvh dom0. > > So your box seems to be capable of generating faults. Missing RMRR > regions is (sadly) expected, but at least you get faults and not a > complete hang. Which chipset does this box have? Is it a C600/X79? > > > > > As to the machine hang Roger observed, I have no idea on the cause. Roger, > > have you ever seen the VT-d on that machine reporting a DMA > > translation fault? If not, can you create one fault in native? I think > > this can tell us whether the hardware's fault report function works well > > or there are some bugs in Xen code. What is your opinion on this trial? > > Is there any simple way to create such a fault? Does the IOMMU have > some kind of self-test thing that can be used to generate a synthetic > fault? > > Thanks, Roger. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
On Mon, Sep 04, 2017 at 04:52:35PM +0800, Chao Gao wrote: > On Mon, Sep 04, 2017 at 10:26:04AM +0100, Roger Pau Monné wrote: > >(Adding Chao again because my MUA seems to drop him each time) > > > >On Mon, Sep 04, 2017 at 10:00:00AM +0100, Roger Pau Monné wrote: > >> On Mon, Sep 04, 2017 at 02:25:10PM +0800, Chao Gao wrote: > >> > On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote: > >> > >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge > >> > >(in fact I didn't even know about Ivy Bridge, that's why I said all > >> > >pre-Haswell). > >> > > > >> > >In fact I'm now trying with a Nehalem processor that seem to work, so > >> > >whatever this issue is it certainly doesn't affect all models or > >> > >chipsets. > >> > > >> > Hi, Roger. > >> > > >> > Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690 > >> > 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang. > >> > > >> > I also tested on Haswell and found RMRRs in dmar are incorrect on my > >> > haswell. The e820 on that machine is: > >> > (XEN) [ 0.000000] Xen-e820 RAM map: > >> > (XEN) [ 0.000000] 0000000000000000 - 000000000009a400 (usable) > >> > (XEN) [ 0.000000] 000000000009a400 - 00000000000a0000 (reserved) > >> > (XEN) [ 0.000000] 00000000000e0000 - 0000000000100000 (reserved) > >> > (XEN) [ 0.000000] 0000000000100000 - 000000006ff84000 (usable) > >> > (XEN) [ 0.000000] 000000006ff84000 - 000000007ac51000 (reserved) > >> > (XEN) [ 0.000000] 000000007ac51000 - 000000007b681000 (ACPI NVS) > >> > (XEN) [ 0.000000] 000000007b681000 - 000000007b7cf000 (ACPI data) > >> > (XEN) [ 0.000000] 000000007b7cf000 - 000000007b800000 (usable) > >> > (XEN) [ 0.000000] 000000007b800000 - 0000000090000000 (reserved) > >> > (XEN) [ 0.000000] 00000000fed1c000 - 00000000fed20000 (reserved) > >> > (XEN) [ 0.000000] 00000000ff400000 - 0000000100000000 (reserved) > >> > (XEN) [ 0.000000] 0000000100000000 - 0000002080000000 (usable) > >> > > >> > And the RMRRs in DMAR are: > >> > (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: > >> > (XEN) [ 0.000000] [VT-D] endpoint: 0000:05:00.0 > >> > (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723b4000 > >> > end_addr 7a3f3fff > >> > (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: > >> > (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1d.0 > >> > (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1a.0 > >> > (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723ac000 > >> > end_addr 723aefff > >> > (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0 > >> > are USB controllers.) > >> > > >> > After DMA remapping is enabled, two DMA translation faults are reported > >> > by VT-d: > >> > (XEN) [ 9.547924] [VT-D]iommu_enable_translation: iommu->reg = > >> > ffff82c00021b000 > >> > (XEN) [ 9.550620] [VT-D]iommu_enable_translation: iommu->reg = > >> > ffff82c00021d000 > >> > (XEN) [ 9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary > >> > Pending Fault > >> > (XEN) [ 9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0] > >> > fault addr 7a3f5000, iommu reg = ffff82c00021d000 > >> > (XEN) [ 9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set > >> > (XEN) [ 9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn > >> > 7a3f5 > >> > (XEN) [ 9.561179] root_entry[00] = 107277c001 > >> > (XEN) [ 9.562447] context[d0] = 2_1072c06001 > >> > (XEN) [ 9.563776] l4[000] = 9c0000202f171107 > >> > (XEN) [ 9.565125] l3[001] = 9c0000202f152107 > >> > (XEN) [ 9.566483] l2[1d1] = 9c000010727ce107 > >> > (XEN) [ 9.567821] l1[1f5] = 8000000000000000 > >> > (XEN) [ 9.569168] l1[1f5] not present > >> > (XEN) [ 9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0] > >> > fault addr 7a3f4000, iommu reg = ffff82c00021d000 > >> > (XEN) [ 9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set > >> > (XEN) [ 9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn > >> > 7a3f4 > >> > (XEN) [ 9.575819] root_entry[00] = 107277c001 > >> > (XEN) [ 9.577129] context[e8] = 2_1072c06001 > >> > (XEN) [ 9.578439] l4[000] = 9c0000202f171107 > >> > (XEN) [ 9.579778] l3[001] = 9c0000202f152107 > >> > (XEN) [ 9.581111] l2[1d1] = 9c000010727ce107 > >> > (XEN) [ 9.582482] l1[1f4] = 8000000000000000 > >> > (XEN) [ 9.583812] l1[1f4] not present > >> > (XEN) [ 10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address > >> > (XEN) [ 10.521499] Failed to load Dom0 kernel > >> > (XEN) [ 10.532171] > >> > (XEN) [ 10.535464] **************************************** > >> > (XEN) [ 10.542636] Panic on CPU 0: > >> > (XEN) [ 10.547394] Could not set up DOM0 guest OS > >> > (XEN) [ 10.553605] **************************************** > >> > > >> > The fault address the devices failed to access is marked as reserved in > >> > e820 and isn't reserved for the devices according to the RMRRs in DMAR. > >> > So I think we can draw a conclusion that some existing BIOSs don't > >> > expose correct RMRR to OS by DMAR. And we need a workaround such as > >> > iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0 > >> > and pvh dom0. > >> > >> So your box seems to be capable of generating faults. Missing RMRR > >> regions is (sadly) expected, but at least you get faults and not a > >> complete hang. Which chipset does this box have? Is it a C600/X79? > > No. The haswell's chipset is C610/x99. Can you try with the C600/x79 chipset? I'm afraid the issue is probably more related to the chipset rather than the CPU itself. > >> > >> > > >> > As to the machine hang Roger observed, I have no idea on the cause. Roger, > >> > have you ever seen the VT-d on that machine reporting a DMA > >> > translation fault? If not, can you create one fault in native? I think > >> > this can tell us whether the hardware's fault report function works well > >> > or there are some bugs in Xen code. What is your opinion on this trial? > >> > >> Is there any simple way to create such a fault? Does the IOMMU have > >> some kind of self-test thing that can be used to generate a synthetic > >> fault? > > I don't know such tool. Maybe you can hack the iommu driver. Hm, OK, it doesn't look very easy to implement something like this. Will try to find some time, but I'm fairly busy ATM. Roger.
OK, I know why my MUA doesn't add your email to the To or Cc when replying, this is because your original email contain the following header tag: Mail-Followup-To: Roger Pau =?iso-8859-1?Q?Monn=E9?= <roger.pau@citrix.com>, "Tian, Kevin" <kevin.tian@intel.com>, Jan Beulich <JBeulich@suse.com>, Andrew Cooper <andrew.cooper3@citrix.com>, "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org> When replying, the following addresses are placed in the "To" field, and as you can see your address is missing from this list. So either you add your address here, or you stop setting "Mail-Followup-To". Roger. On Mon, Sep 04, 2017 at 04:06:51PM +0100, Roger Pau Monné wrote: > On Mon, Sep 04, 2017 at 04:52:35PM +0800, Chao Gao wrote: > > On Mon, Sep 04, 2017 at 10:26:04AM +0100, Roger Pau Monné wrote: > > >(Adding Chao again because my MUA seems to drop him each time) > > > > > >On Mon, Sep 04, 2017 at 10:00:00AM +0100, Roger Pau Monné wrote: > > >> On Mon, Sep 04, 2017 at 02:25:10PM +0800, Chao Gao wrote: > > >> > On Thu, Aug 31, 2017 at 11:09:48AM +0100, Roger Pau Monne wrote: > > >> > >I tested Nehalem, Sandy Bridge and Haswell, but sadly not Ivy Bridge > > >> > >(in fact I didn't even know about Ivy Bridge, that's why I said all > > >> > >pre-Haswell). > > >> > > > > >> > >In fact I'm now trying with a Nehalem processor that seem to work, so > > >> > >whatever this issue is it certainly doesn't affect all models or > > >> > >chipsets. > > >> > > > >> > Hi, Roger. > > >> > > > >> > Last week, I borrowed a Sandy Bridge with Intel(R) Xeon(R) E5-2690 > > >> > 2.7GHz and tested with 'dom0=pvh'. But I didn't see the machine hang. > > >> > > > >> > I also tested on Haswell and found RMRRs in dmar are incorrect on my > > >> > haswell. The e820 on that machine is: > > >> > (XEN) [ 0.000000] Xen-e820 RAM map: > > >> > (XEN) [ 0.000000] 0000000000000000 - 000000000009a400 (usable) > > >> > (XEN) [ 0.000000] 000000000009a400 - 00000000000a0000 (reserved) > > >> > (XEN) [ 0.000000] 00000000000e0000 - 0000000000100000 (reserved) > > >> > (XEN) [ 0.000000] 0000000000100000 - 000000006ff84000 (usable) > > >> > (XEN) [ 0.000000] 000000006ff84000 - 000000007ac51000 (reserved) > > >> > (XEN) [ 0.000000] 000000007ac51000 - 000000007b681000 (ACPI NVS) > > >> > (XEN) [ 0.000000] 000000007b681000 - 000000007b7cf000 (ACPI data) > > >> > (XEN) [ 0.000000] 000000007b7cf000 - 000000007b800000 (usable) > > >> > (XEN) [ 0.000000] 000000007b800000 - 0000000090000000 (reserved) > > >> > (XEN) [ 0.000000] 00000000fed1c000 - 00000000fed20000 (reserved) > > >> > (XEN) [ 0.000000] 00000000ff400000 - 0000000100000000 (reserved) > > >> > (XEN) [ 0.000000] 0000000100000000 - 0000002080000000 (usable) > > >> > > > >> > And the RMRRs in DMAR are: > > >> > (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: > > >> > (XEN) [ 0.000000] [VT-D] endpoint: 0000:05:00.0 > > >> > (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723b4000 > > >> > end_addr 7a3f3fff > > >> > (XEN) [ 0.000000] [VT-D]found ACPI_DMAR_RMRR: > > >> > (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1d.0 > > >> > (XEN) [ 0.000000] [VT-D] endpoint: 0000:00:1a.0 > > >> > (XEN) [ 0.000000] [VT-D]dmar.c:638: RMRR region: base_addr 723ac000 > > >> > end_addr 723aefff > > >> > (Endpoint 05:00.0 is a RAID bus controller. Endpoints 00.1d.0 and 00.1a.0 > > >> > are USB controllers.) > > >> > > > >> > After DMA remapping is enabled, two DMA translation faults are reported > > >> > by VT-d: > > >> > (XEN) [ 9.547924] [VT-D]iommu_enable_translation: iommu->reg = > > >> > ffff82c00021b000 > > >> > (XEN) [ 9.550620] [VT-D]iommu_enable_translation: iommu->reg = > > >> > ffff82c00021d000 > > >> > (XEN) [ 9.553327] [VT-D]iommu.c:921: iommu_fault_status: Primary > > >> > Pending Fault > > >> > (XEN) [ 9.555906] [VT-D]DMAR:[DMA Read] Request device [0000:00:1a.0] > > >> > fault addr 7a3f5000, iommu reg = ffff82c00021d000 > > >> > (XEN) [ 9.558537] [VT-D]DMAR: reason 06 - PTE Read access is not set > > >> > (XEN) [ 9.559860] print_vtd_entries: iommu #1 dev 0000:00:1a.0 gmfn > > >> > 7a3f5 > > >> > (XEN) [ 9.561179] root_entry[00] = 107277c001 > > >> > (XEN) [ 9.562447] context[d0] = 2_1072c06001 > > >> > (XEN) [ 9.563776] l4[000] = 9c0000202f171107 > > >> > (XEN) [ 9.565125] l3[001] = 9c0000202f152107 > > >> > (XEN) [ 9.566483] l2[1d1] = 9c000010727ce107 > > >> > (XEN) [ 9.567821] l1[1f5] = 8000000000000000 > > >> > (XEN) [ 9.569168] l1[1f5] not present > > >> > (XEN) [ 9.570502] [VT-D]DMAR:[DMA Read] Request device [0000:00:1d.0] > > >> > fault addr 7a3f4000, iommu reg = ffff82c00021d000 > > >> > (XEN) [ 9.573147] [VT-D]DMAR: reason 06 - PTE Read access is not set > > >> > (XEN) [ 9.574488] print_vtd_entries: iommu #1 dev 0000:00:1d.0 gmfn > > >> > 7a3f4 > > >> > (XEN) [ 9.575819] root_entry[00] = 107277c001 > > >> > (XEN) [ 9.577129] context[e8] = 2_1072c06001 > > >> > (XEN) [ 9.578439] l4[000] = 9c0000202f171107 > > >> > (XEN) [ 9.579778] l3[001] = 9c0000202f152107 > > >> > (XEN) [ 9.581111] l2[1d1] = 9c000010727ce107 > > >> > (XEN) [ 9.582482] l1[1f4] = 8000000000000000 > > >> > (XEN) [ 9.583812] l1[1f4] not present > > >> > (XEN) [ 10.520172] Unable to find XEN_ELFNOTE_PHYS32_ENTRY address > > >> > (XEN) [ 10.521499] Failed to load Dom0 kernel > > >> > (XEN) [ 10.532171] > > >> > (XEN) [ 10.535464] **************************************** > > >> > (XEN) [ 10.542636] Panic on CPU 0: > > >> > (XEN) [ 10.547394] Could not set up DOM0 guest OS > > >> > (XEN) [ 10.553605] **************************************** > > >> > > > >> > The fault address the devices failed to access is marked as reserved in > > >> > e820 and isn't reserved for the devices according to the RMRRs in DMAR. > > >> > So I think we can draw a conclusion that some existing BIOSs don't > > >> > expose correct RMRR to OS by DMAR. And we need a workaround such as > > >> > iommu_inclusive_mapping to deal with such kind of BIOS for both pv dom0 > > >> > and pvh dom0. > > >> > > >> So your box seems to be capable of generating faults. Missing RMRR > > >> regions is (sadly) expected, but at least you get faults and not a > > >> complete hang. Which chipset does this box have? Is it a C600/X79? > > > > No. The haswell's chipset is C610/x99. > > Can you try with the C600/x79 chipset? I'm afraid the issue is > probably more related to the chipset rather than the CPU itself. > > > >> > > >> > > > >> > As to the machine hang Roger observed, I have no idea on the cause. Roger, > > >> > have you ever seen the VT-d on that machine reporting a DMA > > >> > translation fault? If not, can you create one fault in native? I think > > >> > this can tell us whether the hardware's fault report function works well > > >> > or there are some bugs in Xen code. What is your opinion on this trial? > > >> > > >> Is there any simple way to create such a fault? Does the IOMMU have > > >> some kind of self-test thing that can be used to generate a synthetic > > >> fault? > > > > I don't know such tool. Maybe you can hack the iommu driver. > > Hm, OK, it doesn't look very easy to implement something like this. > Will try to find some time, but I'm fairly busy ATM. > > Roger. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
>>> On 04.09.17 at 17:06, <roger.pau@citrix.com> wrote: > On Mon, Sep 04, 2017 at 04:52:35PM +0800, Chao Gao wrote: >> On Mon, Sep 04, 2017 at 10:26:04AM +0100, Roger Pau Monné wrote: >> >On Mon, Sep 04, 2017 at 10:00:00AM +0100, Roger Pau Monné wrote: >> >> So your box seems to be capable of generating faults. Missing RMRR >> >> regions is (sadly) expected, but at least you get faults and not a >> >> complete hang. Which chipset does this box have? Is it a C600/X79? >> >> No. The haswell's chipset is C610/x99. > > Can you try with the C600/x79 chipset? I'm afraid the issue is > probably more related to the chipset rather than the CPU itself. Or even the firmware. Jan
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c index 0c125e61eb..3e0910d779 100644 --- a/xen/arch/x86/dom0_build.c +++ b/xen/arch/x86/dom0_build.c @@ -440,6 +440,10 @@ int __init dom0_setup_permissions(struct domain *d) rc |= rangeset_add_singleton(mmio_ro_ranges, mfn); } + /* For PVH prevent access to the MMCFG areas. */ + if ( dom0_pvh ) + rc |= pci_mmcfg_set_domain_permissions(d); + return rc; } diff --git a/xen/arch/x86/x86_64/mmconfig_64.c b/xen/arch/x86/x86_64/mmconfig_64.c index e84a67dfc4..271fad407f 100644 --- a/xen/arch/x86/x86_64/mmconfig_64.c +++ b/xen/arch/x86/x86_64/mmconfig_64.c @@ -15,6 +15,8 @@ #include <xen/pci_regs.h> #include <xen/iommu.h> #include <xen/rangeset.h> +#include <xen/sched.h> +#include <xen/iocap.h> #include "mmconfig.h" @@ -175,6 +177,25 @@ void pci_mmcfg_arch_disable(unsigned int idx) cfg->pci_segment, cfg->start_bus_number, cfg->end_bus_number); } +int pci_mmcfg_set_domain_permissions(struct domain *d) +{ + unsigned int idx; + int rc = 0; + + for ( idx = 0; idx < pci_mmcfg_config_num; idx++ ) + { + const struct acpi_mcfg_allocation *cfg = pci_mmcfg_virt[idx].cfg; + unsigned long start = PFN_DOWN(cfg->address) + + PCI_BDF(cfg->start_bus_number, 0, 0); + unsigned long end = PFN_DOWN(cfg->address) + + PCI_BDF(cfg->end_bus_number, ~0, ~0); + + rc |= iomem_deny_access(d, start, end); + } + + return rc; +} + bool_t pci_mmcfg_decode(unsigned long mfn, unsigned int *seg, unsigned int *bdf) { diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h index 59b6e8a81c..ea6a66b248 100644 --- a/xen/include/xen/pci.h +++ b/xen/include/xen/pci.h @@ -170,4 +170,6 @@ int msixtbl_pt_register(struct domain *, struct pirq *, uint64_t gtable); void msixtbl_pt_unregister(struct domain *, struct pirq *); void msixtbl_pt_cleanup(struct domain *d); +int pci_mmcfg_set_domain_permissions(struct domain *d); + #endif /* __XEN_PCI_H__ */
They are emulated by Xen, so they must not be mapped into Dom0 p2m. Introduce a helper function to add the MMCFG areas to the list of denied iomem regions for PVH Dom0. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> --- Changes since RFC: - Introduce as helper instead of exposing the internal mmcfg variables to the Dom0 builder. --- xen/arch/x86/dom0_build.c | 4 ++++ xen/arch/x86/x86_64/mmconfig_64.c | 21 +++++++++++++++++++++ xen/include/xen/pci.h | 2 ++ 3 files changed, 27 insertions(+)