Message ID | 1588653736-10835-1-git-send-email-ashok.raj@intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | iommu: Relax ACS requirement for RCiEP devices. | expand |
On Mon, 4 May 2020 21:42:16 -0700 Ashok Raj <ashok.raj@intel.com> wrote: > PCIe Spec recommends we can relax ACS requirement for RCIEP devices. > > PCIe 5.0 Specification. > 6.12 Access Control Services (ACS) > Implementation of ACS in RCiEPs is permitted but not required. It is > explicitly permitted that, within a single Root Complex, some RCiEPs > implement ACS and some do not. It is strongly recommended that Root Complex > implementations ensure that all accesses originating from RCiEPs > (PFs and VFs) without ACS capability are first subjected to processing by > the Translation Agent (TA) in the Root Complex before further decoding and > processing. The details of such Root Complex handling are outside the scope > of this specification. > > Since Linux didn't give special treatment to allow this exception, certain > RCiEP MFD devices are getting grouped in a single iommu group. This > doesn't permit a single device to be assigned to a guest for instance. > > In one vendor system: Device 14.x were grouped in a single IOMMU group. > > /sys/kernel/iommu_groups/5/devices/0000:00:14.0 > /sys/kernel/iommu_groups/5/devices/0000:00:14.2 > /sys/kernel/iommu_groups/5/devices/0000:00:14.3 > > After the patch: > /sys/kernel/iommu_groups/5/devices/0000:00:14.0 > /sys/kernel/iommu_groups/5/devices/0000:00:14.2 > /sys/kernel/iommu_groups/6/devices/0000:00:14.3 <<< new group > > 14.0 and 14.2 are integrated devices, but legacy end points. > Whereas 14.3 was a PCIe compliant RCiEP. > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > > This permits assigning this device to a guest VM. > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > To: Joerg Roedel <joro@8bytes.org> > To: Bjorn Helgaas <bhelgaas@google.com> > Cc: linux-kernel@vger.kernel.org > Cc: iommu@lists.linux-foundation.org > Cc: Lu Baolu <baolu.lu@linux.intel.com> > Cc: Alex Williamson <alex.williamson@redhat.com> > Cc: Darrel Goeddel <DGoeddel@forcepoint.com> > Cc: Mark Scott <mscott@forcepoint.com>, > Cc: Romil Sharma <rsharma@forcepoint.com> > Cc: Ashok Raj <ashok.raj@intel.com> > --- > drivers/iommu/iommu.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 2b471419e26c..5744bd65f3e2 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -1187,7 +1187,20 @@ static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev, > struct pci_dev *tmp = NULL; > struct iommu_group *group; > > - if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS)) > + /* > + * PCI Spec 5.0, Section 6.12 Access Control Service > + * Implementation of ACS in RCiEPs is permitted but not required. > + * It is explicitly permitted that, within a single Root > + * Complex, some RCiEPs implement ACS and some do not. It is > + * strongly recommended that Root Complex implementations ensure > + * that all accesses originating from RCiEPs (PFs and VFs) without > + * ACS capability are first subjected to processing by the Translation > + * Agent (TA) in the Root Complex before further decoding and > + * processing. > + */ Is the language here really strong enough to make this change? ACS is an optional feature, so being permitted but not required is rather meaningless. The spec is also specifically avoiding the words "must" or "shall" and even when emphasized with "strongly", we still only have a recommendation that may or may not be honored. This seems like a weak basis for assuming that RCiEPs universally honor this recommendation. Thanks, Alex > + if (!pdev->multifunction || > + (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) || > + pci_acs_enabled(pdev, REQ_ACS_FLAGS)) > return NULL; > > for_each_pci_dev(tmp) {
Hi Alex + Joerg, accidently missed in the Cc. On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote: > On Mon, 4 May 2020 21:42:16 -0700 > Ashok Raj <ashok.raj@intel.com> wrote: > > > PCIe Spec recommends we can relax ACS requirement for RCIEP devices. > > > > PCIe 5.0 Specification. > > 6.12 Access Control Services (ACS) > > Implementation of ACS in RCiEPs is permitted but not required. It is > > explicitly permitted that, within a single Root Complex, some RCiEPs > > implement ACS and some do not. It is strongly recommended that Root Complex > > implementations ensure that all accesses originating from RCiEPs > > (PFs and VFs) without ACS capability are first subjected to processing by > > the Translation Agent (TA) in the Root Complex before further decoding and > > processing. The details of such Root Complex handling are outside the scope > > of this specification. > > > > Since Linux didn't give special treatment to allow this exception, certain > > RCiEP MFD devices are getting grouped in a single iommu group. This > > doesn't permit a single device to be assigned to a guest for instance. > > > > In one vendor system: Device 14.x were grouped in a single IOMMU group. > > > > /sys/kernel/iommu_groups/5/devices/0000:00:14.0 > > /sys/kernel/iommu_groups/5/devices/0000:00:14.2 > > /sys/kernel/iommu_groups/5/devices/0000:00:14.3 > > > > After the patch: > > /sys/kernel/iommu_groups/5/devices/0000:00:14.0 > > /sys/kernel/iommu_groups/5/devices/0000:00:14.2 > > /sys/kernel/iommu_groups/6/devices/0000:00:14.3 <<< new group > > > > 14.0 and 14.2 are integrated devices, but legacy end points. > > Whereas 14.3 was a PCIe compliant RCiEP. > > > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > > > > This permits assigning this device to a guest VM. > > > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") > > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > > To: Joerg Roedel <joro@8bytes.org> > > To: Bjorn Helgaas <bhelgaas@google.com> > > Cc: linux-kernel@vger.kernel.org > > Cc: iommu@lists.linux-foundation.org > > Cc: Lu Baolu <baolu.lu@linux.intel.com> > > Cc: Alex Williamson <alex.williamson@redhat.com> > > Cc: Darrel Goeddel <DGoeddel@forcepoint.com> > > Cc: Mark Scott <mscott@forcepoint.com>, > > Cc: Romil Sharma <rsharma@forcepoint.com> > > Cc: Ashok Raj <ashok.raj@intel.com> > > --- > > drivers/iommu/iommu.c | 15 ++++++++++++++- > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index 2b471419e26c..5744bd65f3e2 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -1187,7 +1187,20 @@ static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev, > > struct pci_dev *tmp = NULL; > > struct iommu_group *group; > > > > - if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS)) > > + /* > > + * PCI Spec 5.0, Section 6.12 Access Control Service > > + * Implementation of ACS in RCiEPs is permitted but not required. > > + * It is explicitly permitted that, within a single Root > > + * Complex, some RCiEPs implement ACS and some do not. It is > > + * strongly recommended that Root Complex implementations ensure > > + * that all accesses originating from RCiEPs (PFs and VFs) without > > + * ACS capability are first subjected to processing by the Translation > > + * Agent (TA) in the Root Complex before further decoding and > > + * processing. > > + */ > > Is the language here really strong enough to make this change? ACS is > an optional feature, so being permitted but not required is rather > meaningless. The spec is also specifically avoiding the words "must" > or "shall" and even when emphasized with "strongly", we still only have > a recommendation that may or may not be honored. This seems like a > weak basis for assuming that RCiEPs universally honor this > recommendation. Thanks, > We are speaking about PCIe spec, where people write it about 5 years ahead and every vendor tries to massage their product behavior with vague words like this.. :) But honestly for any any RCiEP, or even integrated endpoints, there is no way to send them except up north. These aren't behind a RP. I did check with couple folks who are part of the SIG, and seem to agree that ACS treatment for RCiEP's doesn't mean much. I understand the language isn't strong, but it doesn't seem like ACS should be a strong requirement for RCiEP's and reasonable to relax. What are your thoughts? Cheers, Ashok > > > + if (!pdev->multifunction || > > + (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) || > > + pci_acs_enabled(pdev, REQ_ACS_FLAGS)) > > return NULL; > > > > for_each_pci_dev(tmp) { >
On Mon, 4 May 2020 23:11:07 -0700 "Raj, Ashok" <ashok.raj@intel.com> wrote: > Hi Alex > > + Joerg, accidently missed in the Cc. > > On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote: > > On Mon, 4 May 2020 21:42:16 -0700 > > Ashok Raj <ashok.raj@intel.com> wrote: > > > > > PCIe Spec recommends we can relax ACS requirement for RCIEP devices. > > > > > > PCIe 5.0 Specification. > > > 6.12 Access Control Services (ACS) > > > Implementation of ACS in RCiEPs is permitted but not required. It is > > > explicitly permitted that, within a single Root Complex, some RCiEPs > > > implement ACS and some do not. It is strongly recommended that Root Complex > > > implementations ensure that all accesses originating from RCiEPs > > > (PFs and VFs) without ACS capability are first subjected to processing by > > > the Translation Agent (TA) in the Root Complex before further decoding and > > > processing. The details of such Root Complex handling are outside the scope > > > of this specification. > > > > > > Since Linux didn't give special treatment to allow this exception, certain > > > RCiEP MFD devices are getting grouped in a single iommu group. This > > > doesn't permit a single device to be assigned to a guest for instance. > > > > > > In one vendor system: Device 14.x were grouped in a single IOMMU group. > > > > > > /sys/kernel/iommu_groups/5/devices/0000:00:14.0 > > > /sys/kernel/iommu_groups/5/devices/0000:00:14.2 > > > /sys/kernel/iommu_groups/5/devices/0000:00:14.3 > > > > > > After the patch: > > > /sys/kernel/iommu_groups/5/devices/0000:00:14.0 > > > /sys/kernel/iommu_groups/5/devices/0000:00:14.2 > > > /sys/kernel/iommu_groups/6/devices/0000:00:14.3 <<< new group > > > > > > 14.0 and 14.2 are integrated devices, but legacy end points. > > > Whereas 14.3 was a PCIe compliant RCiEP. > > > > > > 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) > > > Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 > > > > > > This permits assigning this device to a guest VM. > > > > > > Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") > > > Signed-off-by: Ashok Raj <ashok.raj@intel.com> > > > To: Joerg Roedel <joro@8bytes.org> > > > To: Bjorn Helgaas <bhelgaas@google.com> > > > Cc: linux-kernel@vger.kernel.org > > > Cc: iommu@lists.linux-foundation.org > > > Cc: Lu Baolu <baolu.lu@linux.intel.com> > > > Cc: Alex Williamson <alex.williamson@redhat.com> > > > Cc: Darrel Goeddel <DGoeddel@forcepoint.com> > > > Cc: Mark Scott <mscott@forcepoint.com>, > > > Cc: Romil Sharma <rsharma@forcepoint.com> > > > Cc: Ashok Raj <ashok.raj@intel.com> > > > --- > > > drivers/iommu/iommu.c | 15 ++++++++++++++- > > > 1 file changed, 14 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > > index 2b471419e26c..5744bd65f3e2 100644 > > > --- a/drivers/iommu/iommu.c > > > +++ b/drivers/iommu/iommu.c > > > @@ -1187,7 +1187,20 @@ static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev, > > > struct pci_dev *tmp = NULL; > > > struct iommu_group *group; > > > > > > - if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS)) > > > + /* > > > + * PCI Spec 5.0, Section 6.12 Access Control Service > > > + * Implementation of ACS in RCiEPs is permitted but not required. > > > + * It is explicitly permitted that, within a single Root > > > + * Complex, some RCiEPs implement ACS and some do not. It is > > > + * strongly recommended that Root Complex implementations ensure > > > + * that all accesses originating from RCiEPs (PFs and VFs) without > > > + * ACS capability are first subjected to processing by the Translation > > > + * Agent (TA) in the Root Complex before further decoding and > > > + * processing. > > > + */ > > > > Is the language here really strong enough to make this change? ACS is > > an optional feature, so being permitted but not required is rather > > meaningless. The spec is also specifically avoiding the words "must" > > or "shall" and even when emphasized with "strongly", we still only have > > a recommendation that may or may not be honored. This seems like a > > weak basis for assuming that RCiEPs universally honor this > > recommendation. Thanks, > > > > We are speaking about PCIe spec, where people write it about 5 years ahead > and every vendor tries to massage their product behavior with vague > words like this.. :) > > But honestly for any any RCiEP, or even integrated endpoints, there > is no way to send them except up north. These aren't behind a RP. But they are multi-function devices and the spec doesn't define routing within multifunction packages. A single function RCiEP will already be assumed isolated within its own group. > I did check with couple folks who are part of the SIG, and seem to agree > that ACS treatment for RCiEP's doesn't mean much. > > I understand the language isn't strong, but it doesn't seem like ACS should > be a strong requirement for RCiEP's and reasonable to relax. > > What are your thoughts? I think hardware vendors have ACS at their disposal to clarify when isolation is provided, otherwise vendors can submit quirks, but I don't see that the "strongly recommended" phrasing is sufficient to assume isolation between multifunction RCiEPs. Thanks, Alex
On Tue, May 05, 2020 at 08:05:14AM -0600, Alex Williamson wrote: > On Mon, 4 May 2020 23:11:07 -0700 > "Raj, Ashok" <ashok.raj@intel.com> wrote: > > > Hi Alex > > > > + Joerg, accidently missed in the Cc. > > > > On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote: > > > On Mon, 4 May 2020 21:42:16 -0700 > > > Ashok Raj <ashok.raj@intel.com> wrote: > > > > > > > PCIe Spec recommends we can relax ACS requirement for RCIEP devices. > > > > > > > > PCIe 5.0 Specification. > > > > 6.12 Access Control Services (ACS) > > > > Implementation of ACS in RCiEPs is permitted but not required. It is > > > > explicitly permitted that, within a single Root Complex, some RCiEPs > > > > implement ACS and some do not. It is strongly recommended that Root Complex > > > > implementations ensure that all accesses originating from RCiEPs > > > > (PFs and VFs) without ACS capability are first subjected to processing by > > > > the Translation Agent (TA) in the Root Complex before further decoding and > > > > processing. The details of such Root Complex handling are outside the scope > > > > of this specification. > > > > > > > > > > Is the language here really strong enough to make this change? ACS is > > > an optional feature, so being permitted but not required is rather > > > meaningless. The spec is also specifically avoiding the words "must" > > > or "shall" and even when emphasized with "strongly", we still only have > > > a recommendation that may or may not be honored. This seems like a > > > weak basis for assuming that RCiEPs universally honor this > > > recommendation. Thanks, > > > > > > > We are speaking about PCIe spec, where people write it about 5 years ahead > > and every vendor tries to massage their product behavior with vague > > words like this.. :) > > > > But honestly for any any RCiEP, or even integrated endpoints, there > > is no way to send them except up north. These aren't behind a RP. > > But they are multi-function devices and the spec doesn't define routing > within multifunction packages. A single function RCiEP will already be > assumed isolated within its own group. That's right. The other two devices only have legacy PCI headers. So they can't claim to be RCiEP's but just integrated endpoints. The legacy devices don't even have a PCIe header. I honestly don't know why these are groped as MFD's in the first place. > > > I did check with couple folks who are part of the SIG, and seem to agree > > that ACS treatment for RCiEP's doesn't mean much. > > > > I understand the language isn't strong, but it doesn't seem like ACS should > > be a strong requirement for RCiEP's and reasonable to relax. > > > > What are your thoughts? > > I think hardware vendors have ACS at their disposal to clarify when > isolation is provided, otherwise vendors can submit quirks, but I don't > see that the "strongly recommended" phrasing is sufficient to assume > isolation between multifunction RCiEPs. Thanks, You point is that integrated MFD endpoints, without ACS, there is no gaurantee to SW that they are isolated. As far as a quirk, do you think: - a cmdline optput for integrated endpoints, and RCiEP's suffice? along with a compile time default that is strict enforcement - typical vid/did type exception list? A more generic way to ask for exception would be scalable until we can stop those type of integrated devices. Or we need to maintain these device lists for eternity. Cheers, Ashok
On Tue, 5 May 2020 07:56:06 -0700 "Raj, Ashok" <ashok.raj@intel.com> wrote: > On Tue, May 05, 2020 at 08:05:14AM -0600, Alex Williamson wrote: > > On Mon, 4 May 2020 23:11:07 -0700 > > "Raj, Ashok" <ashok.raj@intel.com> wrote: > > > > > Hi Alex > > > > > > + Joerg, accidently missed in the Cc. > > > > > > On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote: > > > > On Mon, 4 May 2020 21:42:16 -0700 > > > > Ashok Raj <ashok.raj@intel.com> wrote: > > > > > > > > > PCIe Spec recommends we can relax ACS requirement for RCIEP devices. > > > > > > > > > > PCIe 5.0 Specification. > > > > > 6.12 Access Control Services (ACS) > > > > > Implementation of ACS in RCiEPs is permitted but not required. It is > > > > > explicitly permitted that, within a single Root Complex, some RCiEPs > > > > > implement ACS and some do not. It is strongly recommended that Root Complex > > > > > implementations ensure that all accesses originating from RCiEPs > > > > > (PFs and VFs) without ACS capability are first subjected to processing by > > > > > the Translation Agent (TA) in the Root Complex before further decoding and > > > > > processing. The details of such Root Complex handling are outside the scope > > > > > of this specification. > > > > > > > > > > > > > Is the language here really strong enough to make this change? ACS is > > > > an optional feature, so being permitted but not required is rather > > > > meaningless. The spec is also specifically avoiding the words "must" > > > > or "shall" and even when emphasized with "strongly", we still only have > > > > a recommendation that may or may not be honored. This seems like a > > > > weak basis for assuming that RCiEPs universally honor this > > > > recommendation. Thanks, > > > > > > > > > > We are speaking about PCIe spec, where people write it about 5 years ahead > > > and every vendor tries to massage their product behavior with vague > > > words like this.. :) > > > > > > But honestly for any any RCiEP, or even integrated endpoints, there > > > is no way to send them except up north. These aren't behind a RP. > > > > But they are multi-function devices and the spec doesn't define routing > > within multifunction packages. A single function RCiEP will already be > > assumed isolated within its own group. > > That's right. The other two devices only have legacy PCI headers. So > they can't claim to be RCiEP's but just integrated endpoints. The legacy > devices don't even have a PCIe header. > > I honestly don't know why these are groped as MFD's in the first place. > > > > > > I did check with couple folks who are part of the SIG, and seem to agree > > > that ACS treatment for RCiEP's doesn't mean much. > > > > > > I understand the language isn't strong, but it doesn't seem like ACS should > > > be a strong requirement for RCiEP's and reasonable to relax. > > > > > > What are your thoughts? > > > > I think hardware vendors have ACS at their disposal to clarify when > > isolation is provided, otherwise vendors can submit quirks, but I don't > > see that the "strongly recommended" phrasing is sufficient to assume > > isolation between multifunction RCiEPs. Thanks, > > You point is that integrated MFD endpoints, without ACS, there is no > gaurantee to SW that they are isolated. > > As far as a quirk, do you think: > - a cmdline optput for integrated endpoints, and RCiEP's suffice? > along with a compile time default that is strict enforcement > - typical vid/did type exception list? > > A more generic way to ask for exception would be scalable until we can stop > those type of integrated devices. Or we need to maintain these device lists > for eternity. I don't think the language in the spec is anything sufficient to handle RCiEP uniquely. We've previously rejected kernel command line opt-outs for ACS, and the extent to which those patches still float around the user community and are blindly used to separate IOMMU groups are a testament to the failure of this approach. Users do not have a basis for enabling this sort of opt-out. The benefit is obvious in the IOMMU grouping, but the risk is entirely unknown. A kconfig option is even worse as that means if you consume a downstream kernel, the downstream maintainers might have decided universally that isolation is less important than functionality. I think the only solution is that the hardware vendors need to step up to indicate where devices are isolated. The hardware can do this itself by implementing ACS, otherwise we need quirks. I think we've also generally been reluctant to accept quirks that provide a blanket opt-out for a vendor because doing so is akin to trying to predict the future (determining the behavior of all current and previous hardware is generally a sufficiently impossible task already). Perhaps if a vendor has a published internal policy regarding RCiEP isolation and is willing to stand by a quirk, there might be room to negotiate. Thanks, Alex
Hi Alex, I was able to find better language in the IOMMU spec that gaurantees the behavior we need. See below. On Tue, May 05, 2020 at 09:34:14AM -0600, Alex Williamson wrote: > On Tue, 5 May 2020 07:56:06 -0700 > "Raj, Ashok" <ashok.raj@intel.com> wrote: > > > On Tue, May 05, 2020 at 08:05:14AM -0600, Alex Williamson wrote: > > > On Mon, 4 May 2020 23:11:07 -0700 > > > "Raj, Ashok" <ashok.raj@intel.com> wrote: > > > > > > > Hi Alex > > > > > > > > + Joerg, accidently missed in the Cc. > > > > > > > > On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote: > > > > > On Mon, 4 May 2020 21:42:16 -0700 > > > > > Ashok Raj <ashok.raj@intel.com> wrote: > > > > > > > > > > > PCIe Spec recommends we can relax ACS requirement for RCIEP devices. > > > > > > > > > > > > PCIe 5.0 Specification. > > > > > > 6.12 Access Control Services (ACS) > > > > > > Implementation of ACS in RCiEPs is permitted but not required. It is > > > > > > explicitly permitted that, within a single Root Complex, some RCiEPs > > > > > > implement ACS and some do not. It is strongly recommended that Root Complex > > > > > > implementations ensure that all accesses originating from RCiEPs > > > > > > (PFs and VFs) without ACS capability are first subjected to processing by > > > > > > the Translation Agent (TA) in the Root Complex before further decoding and > > > > > > processing. The details of such Root Complex handling are outside the scope > > > > > > of this specification. > > > > > > > > > > > > > > > > Is the language here really strong enough to make this change? ACS is > > > > > an optional feature, so being permitted but not required is rather > > > > > meaningless. The spec is also specifically avoiding the words "must" > > > > > or "shall" and even when emphasized with "strongly", we still only have > > > > > a recommendation that may or may not be honored. This seems like a > > > > > weak basis for assuming that RCiEPs universally honor this > > > > > recommendation. Thanks, > > > > > > > > > > > > > We are speaking about PCIe spec, where people write it about 5 years ahead > > > > and every vendor tries to massage their product behavior with vague > > > > words like this.. :) > > > > > > > > But honestly for any any RCiEP, or even integrated endpoints, there > > > > is no way to send them except up north. These aren't behind a RP. > > > > > > But they are multi-function devices and the spec doesn't define routing > > > within multifunction packages. A single function RCiEP will already be > > > assumed isolated within its own group. > > > > That's right. The other two devices only have legacy PCI headers. So > > they can't claim to be RCiEP's but just integrated endpoints. The legacy > > devices don't even have a PCIe header. > > > > I honestly don't know why these are groped as MFD's in the first place. > > > > > > > > > I did check with couple folks who are part of the SIG, and seem to agree > > > > that ACS treatment for RCiEP's doesn't mean much. > > > > > > > > I understand the language isn't strong, but it doesn't seem like ACS should > > > > be a strong requirement for RCiEP's and reasonable to relax. > > > > > > > > What are your thoughts? > > > > > > I think hardware vendors have ACS at their disposal to clarify when > > > isolation is provided, otherwise vendors can submit quirks, but I don't > > > see that the "strongly recommended" phrasing is sufficient to assume > > > isolation between multifunction RCiEPs. Thanks, > > > > You point is that integrated MFD endpoints, without ACS, there is no > > gaurantee to SW that they are isolated. > > > > As far as a quirk, do you think: > > - a cmdline optput for integrated endpoints, and RCiEP's suffice? > > along with a compile time default that is strict enforcement > > - typical vid/did type exception list? > > > > A more generic way to ask for exception would be scalable until we can stop > > those type of integrated devices. Or we need to maintain these device lists > > for eternity. > > I don't think the language in the spec is anything sufficient to handle > RCiEP uniquely. We've previously rejected kernel command line opt-outs > for ACS, and the extent to which those patches still float around the > user community and are blindly used to separate IOMMU groups are a > testament to the failure of this approach. Users do not have a basis > for enabling this sort of opt-out. The benefit is obvious in the IOMMU > grouping, but the risk is entirely unknown. A kconfig option is even > worse as that means if you consume a downstream kernel, the downstream > maintainers might have decided universally that isolation is less > important than functionality. We discussed this internally, and Intel vt-d spec does spell out clearly in Section 3.16 Root-Complex Peer to Peer Considerations. The spec clearly calls out that all p2p must be done on translated addresses and therefore must go through the IOMMU. I suppose they should also have some similar platform gauranteed behavior for RCiEP's or MFD's *Must* behave as follows. The language is strict and when IOMMU is enabled in the platform, everything is sent up north to the IOMMU agent. 3.16 Root-Complex Peer to Peer Considerations When DMA remapping is enabled, peer-to-peer requests through the Root-Complex must be handled as follows: • The input address in the request is translated (through first-level, second-level or nested translation) to a host physical address (HPA). The address decoding for peer addresses must be done only on the translated HPA. Hardware implementations are free to further limit peer-to-peer accesses to specific host physical address regions (or to completely disallow peer-forwarding of translated requests). • Since address translation changes the contents (address field) of the PCI Express Transaction Layer Packet (TLP), for PCI Express peer-to-peer requests with ECRC, the Root-Complex hardware must use the new ECRC (re-computed with the translated address) if it decides to forward the TLP as a peer request. • Root-ports, and multi-function root-complex integrated endpoints, may support additional peerto-peer control features by supporting PCI Express Access Control Services (ACS) capability. Refer to ACS capability in PCI Express specifications for details. > to indicate where devices are isolated. The hardware can do this > itself by implementing ACS, otherwise we need quirks. I think we've > also generally been reluctant to accept quirks that provide a blanket > opt-out for a vendor because doing so is akin to trying to predict the > future (determining the behavior of all current and previous hardware > is generally a sufficiently impossible task already). Perhaps if a > vendor has a published internal policy regarding RCiEP isolation and is > willing to stand by a quirk, there might be room to negotiate. Thanks, > > Alex >
On Tue, 26 May 2020 11:06:48 -0700 "Raj, Ashok" <ashok.raj@intel.com> wrote: > Hi Alex, > > I was able to find better language in the IOMMU spec that gaurantees > the behavior we need. See below. > > > On Tue, May 05, 2020 at 09:34:14AM -0600, Alex Williamson wrote: > > On Tue, 5 May 2020 07:56:06 -0700 > > "Raj, Ashok" <ashok.raj@intel.com> wrote: > > > > > On Tue, May 05, 2020 at 08:05:14AM -0600, Alex Williamson wrote: > > > > On Mon, 4 May 2020 23:11:07 -0700 > > > > "Raj, Ashok" <ashok.raj@intel.com> wrote: > > > > > > > > > Hi Alex > > > > > > > > > > + Joerg, accidently missed in the Cc. > > > > > > > > > > On Mon, May 04, 2020 at 11:19:36PM -0600, Alex Williamson wrote: > > > > > > On Mon, 4 May 2020 21:42:16 -0700 > > > > > > Ashok Raj <ashok.raj@intel.com> wrote: > > > > > > > > > > > > > PCIe Spec recommends we can relax ACS requirement for RCIEP devices. > > > > > > > > > > > > > > PCIe 5.0 Specification. > > > > > > > 6.12 Access Control Services (ACS) > > > > > > > Implementation of ACS in RCiEPs is permitted but not required. It is > > > > > > > explicitly permitted that, within a single Root Complex, some RCiEPs > > > > > > > implement ACS and some do not. It is strongly recommended that Root Complex > > > > > > > implementations ensure that all accesses originating from RCiEPs > > > > > > > (PFs and VFs) without ACS capability are first subjected to processing by > > > > > > > the Translation Agent (TA) in the Root Complex before further decoding and > > > > > > > processing. The details of such Root Complex handling are outside the scope > > > > > > > of this specification. > > > > > > > > > > > > > > > > > > > Is the language here really strong enough to make this change? ACS is > > > > > > an optional feature, so being permitted but not required is rather > > > > > > meaningless. The spec is also specifically avoiding the words "must" > > > > > > or "shall" and even when emphasized with "strongly", we still only have > > > > > > a recommendation that may or may not be honored. This seems like a > > > > > > weak basis for assuming that RCiEPs universally honor this > > > > > > recommendation. Thanks, > > > > > > > > > > > > > > > > We are speaking about PCIe spec, where people write it about 5 years ahead > > > > > and every vendor tries to massage their product behavior with vague > > > > > words like this.. :) > > > > > > > > > > But honestly for any any RCiEP, or even integrated endpoints, there > > > > > is no way to send them except up north. These aren't behind a RP. > > > > > > > > But they are multi-function devices and the spec doesn't define routing > > > > within multifunction packages. A single function RCiEP will already be > > > > assumed isolated within its own group. > > > > > > That's right. The other two devices only have legacy PCI headers. So > > > they can't claim to be RCiEP's but just integrated endpoints. The legacy > > > devices don't even have a PCIe header. > > > > > > I honestly don't know why these are groped as MFD's in the first place. > > > > > > > > > > > > I did check with couple folks who are part of the SIG, and seem to agree > > > > > that ACS treatment for RCiEP's doesn't mean much. > > > > > > > > > > I understand the language isn't strong, but it doesn't seem like ACS should > > > > > be a strong requirement for RCiEP's and reasonable to relax. > > > > > > > > > > What are your thoughts? > > > > > > > > I think hardware vendors have ACS at their disposal to clarify when > > > > isolation is provided, otherwise vendors can submit quirks, but I don't > > > > see that the "strongly recommended" phrasing is sufficient to assume > > > > isolation between multifunction RCiEPs. Thanks, > > > > > > You point is that integrated MFD endpoints, without ACS, there is no > > > gaurantee to SW that they are isolated. > > > > > > As far as a quirk, do you think: > > > - a cmdline optput for integrated endpoints, and RCiEP's suffice? > > > along with a compile time default that is strict enforcement > > > - typical vid/did type exception list? > > > > > > A more generic way to ask for exception would be scalable until we can stop > > > those type of integrated devices. Or we need to maintain these device lists > > > for eternity. > > > > I don't think the language in the spec is anything sufficient to handle > > RCiEP uniquely. We've previously rejected kernel command line opt-outs > > for ACS, and the extent to which those patches still float around the > > user community and are blindly used to separate IOMMU groups are a > > testament to the failure of this approach. Users do not have a basis > > for enabling this sort of opt-out. The benefit is obvious in the IOMMU > > grouping, but the risk is entirely unknown. A kconfig option is even > > worse as that means if you consume a downstream kernel, the downstream > > maintainers might have decided universally that isolation is less > > important than functionality. > > We discussed this internally, and Intel vt-d spec does spell out clearly > in Section 3.16 Root-Complex Peer to Peer Considerations. The spec clearly > calls out that all p2p must be done on translated addresses and therefore > must go through the IOMMU. > > I suppose they should also have some similar platform gauranteed behavior > for RCiEP's or MFD's *Must* behave as follows. The language is strict and > when IOMMU is enabled in the platform, everything is sent up north to the > IOMMU agent. > > 3.16 Root-Complex Peer to Peer Considerations > When DMA remapping is enabled, peer-to-peer requests through the > Root-Complex must be handled > as follows: > • The input address in the request is translated (through first-level, > second-level or nested translation) to a host physical address (HPA). > The address decoding for peer addresses must be done only on the > translated HPA. Hardware implementations are free to further limit > peer-to-peer accesses to specific host physical address regions > (or to completely disallow peer-forwarding of translated requests). > • Since address translation changes the contents (address field) of the PCI > Express Transaction Layer Packet (TLP), for PCI Express peer-to-peer > requests with ECRC, the Root-Complex hardware must use the new ECRC > (re-computed with the translated address) if it decides to forward > the TLP as a peer request. > • Root-ports, and multi-function root-complex integrated endpoints, may > support additional peerto-peer control features by supporting PCI Express > Access Control Services (ACS) capability. Refer to ACS capability in > PCI Express specifications for details. That sounds like it might be a reasonable basis for quirking all RCiEPs on VT-d platforms if Intel is willing to stand behind it. Thanks, Alex
On Tue, May 26, 2020 at 12:26:54PM -0600, Alex Williamson wrote: > > > > > > I don't think the language in the spec is anything sufficient to handle > > > RCiEP uniquely. We've previously rejected kernel command line opt-outs > > > for ACS, and the extent to which those patches still float around the > > > user community and are blindly used to separate IOMMU groups are a > > > testament to the failure of this approach. Users do not have a basis > > > for enabling this sort of opt-out. The benefit is obvious in the IOMMU > > > grouping, but the risk is entirely unknown. A kconfig option is even > > > worse as that means if you consume a downstream kernel, the downstream > > > maintainers might have decided universally that isolation is less > > > important than functionality. > > > > We discussed this internally, and Intel vt-d spec does spell out clearly > > in Section 3.16 Root-Complex Peer to Peer Considerations. The spec clearly > > calls out that all p2p must be done on translated addresses and therefore > > must go through the IOMMU. > > > > I suppose they should also have some similar platform gauranteed behavior > > for RCiEP's or MFD's *Must* behave as follows. The language is strict and > > when IOMMU is enabled in the platform, everything is sent up north to the > > IOMMU agent. > > > > 3.16 Root-Complex Peer to Peer Considerations > > When DMA remapping is enabled, peer-to-peer requests through the > > Root-Complex must be handled > > as follows: > > • The input address in the request is translated (through first-level, > > second-level or nested translation) to a host physical address (HPA). > > The address decoding for peer addresses must be done only on the > > translated HPA. Hardware implementations are free to further limit > > peer-to-peer accesses to specific host physical address regions > > (or to completely disallow peer-forwarding of translated requests). > > • Since address translation changes the contents (address field) of the PCI > > Express Transaction Layer Packet (TLP), for PCI Express peer-to-peer > > requests with ECRC, the Root-Complex hardware must use the new ECRC > > (re-computed with the translated address) if it decides to forward > > the TLP as a peer request. > > • Root-ports, and multi-function root-complex integrated endpoints, may > > support additional peerto-peer control features by supporting PCI Express > > Access Control Services (ACS) capability. Refer to ACS capability in > > PCI Express specifications for details. > > That sounds like it might be a reasonable basis for quirking all RCiEPs > on VT-d platforms if Intel is willing to stand behind it. Thanks, > Sounds good.. that's what i hear from our platform teams. If there is a violation it would be a bug in silicon.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 2b471419e26c..5744bd65f3e2 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1187,7 +1187,20 @@ static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev, struct pci_dev *tmp = NULL; struct iommu_group *group; - if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS)) + /* + * PCI Spec 5.0, Section 6.12 Access Control Service + * Implementation of ACS in RCiEPs is permitted but not required. + * It is explicitly permitted that, within a single Root + * Complex, some RCiEPs implement ACS and some do not. It is + * strongly recommended that Root Complex implementations ensure + * that all accesses originating from RCiEPs (PFs and VFs) without + * ACS capability are first subjected to processing by the Translation + * Agent (TA) in the Root Complex before further decoding and + * processing. + */ + if (!pdev->multifunction || + (pci_pcie_type(pdev) == PCI_EXP_TYPE_RC_END) || + pci_acs_enabled(pdev, REQ_ACS_FLAGS)) return NULL; for_each_pci_dev(tmp) {
PCIe Spec recommends we can relax ACS requirement for RCIEP devices. PCIe 5.0 Specification. 6.12 Access Control Services (ACS) Implementation of ACS in RCiEPs is permitted but not required. It is explicitly permitted that, within a single Root Complex, some RCiEPs implement ACS and some do not. It is strongly recommended that Root Complex implementations ensure that all accesses originating from RCiEPs (PFs and VFs) without ACS capability are first subjected to processing by the Translation Agent (TA) in the Root Complex before further decoding and processing. The details of such Root Complex handling are outside the scope of this specification. Since Linux didn't give special treatment to allow this exception, certain RCiEP MFD devices are getting grouped in a single iommu group. This doesn't permit a single device to be assigned to a guest for instance. In one vendor system: Device 14.x were grouped in a single IOMMU group. /sys/kernel/iommu_groups/5/devices/0000:00:14.0 /sys/kernel/iommu_groups/5/devices/0000:00:14.2 /sys/kernel/iommu_groups/5/devices/0000:00:14.3 After the patch: /sys/kernel/iommu_groups/5/devices/0000:00:14.0 /sys/kernel/iommu_groups/5/devices/0000:00:14.2 /sys/kernel/iommu_groups/6/devices/0000:00:14.3 <<< new group 14.0 and 14.2 are integrated devices, but legacy end points. Whereas 14.3 was a PCIe compliant RCiEP. 00:14.3 Network controller: Intel Corporation Device 9df0 (rev 30) Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00 This permits assigning this device to a guest VM. Fixes: f096c061f552 ("iommu: Rework iommu_group_get_for_pci_dev()") Signed-off-by: Ashok Raj <ashok.raj@intel.com> To: Joerg Roedel <joro@8bytes.org> To: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org Cc: iommu@lists.linux-foundation.org Cc: Lu Baolu <baolu.lu@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Darrel Goeddel <DGoeddel@forcepoint.com> Cc: Mark Scott <mscott@forcepoint.com>, Cc: Romil Sharma <rsharma@forcepoint.com> Cc: Ashok Raj <ashok.raj@intel.com> --- drivers/iommu/iommu.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)