Message ID | 20230511191654.400720-6-stewart.hildebrand@amd.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | SMMU handling for PCIe Passthrough on ARM | expand |
On 11.05.2023 21:16, Stewart Hildebrand wrote: > @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, > pdev->domain = NULL; > goto out; > } > +#ifdef CONFIG_HAS_DEVICE_TREE > + ret = iommu_add_dt_pci_device(pdev); > + if ( ret < 0 ) > + { > + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret); > + goto out; > + } > +#endif > ret = iommu_add_device(pdev); Hmm, am I misremembering that in the earlier patch you had #else to invoke the alternative behavior? Now you end up calling both functions; if that's indeed intended, this may still want doing differently. Looking at the earlier patch introducing the function, I can't infer though whether that's intended: iommu_add_dt_pci_device() checks that the add_device hook is present, but then I didn't find any use of this hook. The revlog there suggests the check might be stale. If indeed the function does only preparatory work, I don't see why it would need naming "iommu_..."; I'd rather consider pci_add_dt_device() then. Plus in such a case #ifdef-ary here probably wants avoiding by introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then ... > if ( ret ) > { > +#ifdef CONFIG_HAS_DEVICE_TREE > + iommu_fwspec_free(pci_to_dev(pdev)); > +#endif ... this (which I understand is doing the corresponding cleanup) then also wants wrapping in a suitably named tiny helper function. But yet further I'm then no longer convinced this is the right place for the addition. pci_add_device() is backing physdev hypercalls. It would seem to me that the function may want invoking yet one layer further up, or it may even want invoking from a brand new DT-specific physdev-op. This would then leave at least the x86-only paths (invoking pci_add_device() from outside of pci_physdev_op()) entirely alone. Jan
On 5/12/23 03:25, Jan Beulich wrote: > On 11.05.2023 21:16, Stewart Hildebrand wrote: >> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, >> pdev->domain = NULL; >> goto out; >> } >> +#ifdef CONFIG_HAS_DEVICE_TREE >> + ret = iommu_add_dt_pci_device(pdev); >> + if ( ret < 0 ) >> + { >> + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret); >> + goto out; >> + } >> +#endif >> ret = iommu_add_device(pdev); > > Hmm, am I misremembering that in the earlier patch you had #else to > invoke the alternative behavior? You are remembering correctly. v1 had an #else, v2 does not. > Now you end up calling both functions; > if that's indeed intended, Yes, this is intentional. > this may still want doing differently. > Looking at the earlier patch introducing the function, I can't infer > though whether that's intended: iommu_add_dt_pci_device() checks that > the add_device hook is present, but then I didn't find any use of this > hook. The revlog there suggests the check might be stale. Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there. > If indeed the function does only preparatory work, I don't see why it > would need naming "iommu_..."; I'd rather consider pci_add_dt_device() > then. The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below). > Plus in such a case #ifdef-ary here probably wants avoiding by > introducing a suitable no-op stub for the !HAS_DEVICE_TREE case. Then > ... > >> if ( ret ) >> { >> +#ifdef CONFIG_HAS_DEVICE_TREE >> + iommu_fwspec_free(pci_to_dev(pdev)); >> +#endif > > ... this (which I understand is doing the corresponding cleanup) then > also wants wrapping in a suitably named tiny helper function. Sure, I'm on board with eliminating/reducing the #ifdef-ary where possible. Will do. > But yet further I'm then no longer convinced this is the right place > for the addition. pci_add_device() is backing physdev hypercalls. It > would seem to me that the function may want invoking yet one layer > further up, or it may even want invoking from a brand new DT-specific > physdev-op. This would then leave at least the x86-only paths (invoking > pci_add_device() from outside of pci_physdev_op()) entirely alone. Let's establish that pci_add_device()/iommu_add_device() are already inherently performing tasks related to setting up a PCI device to work with an IOMMU. The preparatory work in question needs to happen after: pci_add_device() -> alloc_pdev() since we need to know all the possible RIDs (including those for phantom functions), but before the add_device iommu hook: pci_add_device() -> iommu_add_device() -> iommu_call(hd->platform_ops, add_device, ...) The preparatory work (i.e. mapping RID/BDF -> AXI stream ID) is inherently associated with setting up a PCI device to work with an ARM SMMU (but not any particular variant of the SMMU). The SMMU distinguishes what PCI device/function DMA traffic is associated with based on the derived AXI stream ID (sideband data), not the RID/BDF directly. See [1]. Moving the preparatory work one layer up would mean duplicating what alloc_pdev() is already doing to set up pdev->phantom_stride (which we need to figure out all RIDs for that particular device). Moving it down into the individual SMMU drivers (smmu_ops/platform_ops) would mean duplicating special phantom function handling in each SMMU driver, and further deviating from the Linux SMMU driver(s) they are based on. It still feels to me like pci_add_device() (or iommu_add_device()) is the right place to perform the RID/BDF -> AXI stream ID mapping. Since there's nothing inherently DT specific (or ACPI specific) about deriving sideband data from RID/BDF, let me propose a new name for the function (instead of iommu_add_dt_pci_device): iommu_derive_pci_device_sideband_IDs() Now, as far as DT and ACPI co-existing goes, I admit I haven't tested with CONFIG_ACPI=y yet (there seems to be some issues when both CONFIG_ARM_SMMU_V3=y and CONFIG_ACPI=y are enabled, even in staging). But I do recognize that we need a way support both CONFIG_HAS_DEVICE_TREE=y and CONFIG_ACPI=y simultaneously. Let me think on that for a bit... [1] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
On 12.05.2023 23:03, Stewart Hildebrand wrote: > On 5/12/23 03:25, Jan Beulich wrote: >> On 11.05.2023 21:16, Stewart Hildebrand wrote: >>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, >>> pdev->domain = NULL; >>> goto out; >>> } >>> +#ifdef CONFIG_HAS_DEVICE_TREE >>> + ret = iommu_add_dt_pci_device(pdev); >>> + if ( ret < 0 ) >>> + { >>> + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret); >>> + goto out; >>> + } >>> +#endif >>> ret = iommu_add_device(pdev); >> >> Hmm, am I misremembering that in the earlier patch you had #else to >> invoke the alternative behavior? > > You are remembering correctly. v1 had an #else, v2 does not. > >> Now you end up calling both functions; >> if that's indeed intended, > > Yes, this is intentional. > >> this may still want doing differently. >> Looking at the earlier patch introducing the function, I can't infer >> though whether that's intended: iommu_add_dt_pci_device() checks that >> the add_device hook is present, but then I didn't find any use of this >> hook. The revlog there suggests the check might be stale. > > Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there. > >> If indeed the function does only preparatory work, I don't see why it >> would need naming "iommu_..."; I'd rather consider pci_add_dt_device() >> then. > > The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below). The function being SMMU-related pretty strongly suggests it wants to be invoked via a hook. If the add_device() one isn't suitable, perhaps we need a new (optional) prepare_device() one? With pci_add_device() then calling iommu_prepare_device(), wrapping the hook invocation? But just to be clear: A new hook would need enough justification as to the existing one being unsuitable. Jan
On 12.05.2023 23:03, Stewart Hildebrand wrote: > On 5/12/23 03:25, Jan Beulich wrote: >> On 11.05.2023 21:16, Stewart Hildebrand wrote: >>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, >>> pdev->domain = NULL; >>> goto out; >>> } >>> +#ifdef CONFIG_HAS_DEVICE_TREE >>> + ret = iommu_add_dt_pci_device(pdev); >>> + if ( ret < 0 ) >>> + { >>> + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret); >>> + goto out; >>> + } >>> +#endif >>> ret = iommu_add_device(pdev); >> >> Hmm, am I misremembering that in the earlier patch you had #else to >> invoke the alternative behavior? > > You are remembering correctly. v1 had an #else, v2 does not. > >> Now you end up calling both functions; >> if that's indeed intended, > > Yes, this is intentional. > >> this may still want doing differently. >> Looking at the earlier patch introducing the function, I can't infer >> though whether that's intended: iommu_add_dt_pci_device() checks that >> the add_device hook is present, but then I didn't find any use of this >> hook. The revlog there suggests the check might be stale. > > Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there. > >> If indeed the function does only preparatory work, I don't see why it >> would need naming "iommu_..."; I'd rather consider pci_add_dt_device() >> then. > > The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below). The function being SMMU-related pretty strongly suggests it wants to be invoked via a hook. If the add_device() one isn't suitable, perhaps we need a new (optional) prepare_device() one? With pci_add_device() then calling iommu_prepare_device(), wrapping the hook invocation? But just to be clear: A new hook would need enough justification as to the existing one being unsuitable. Jan
On 5/15/23 03:30, Jan Beulich wrote: > On 12.05.2023 23:03, Stewart Hildebrand wrote: >> On 5/12/23 03:25, Jan Beulich wrote: >>> On 11.05.2023 21:16, Stewart Hildebrand wrote: >>>> @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, >>>> pdev->domain = NULL; >>>> goto out; >>>> } >>>> +#ifdef CONFIG_HAS_DEVICE_TREE >>>> + ret = iommu_add_dt_pci_device(pdev); >>>> + if ( ret < 0 ) >>>> + { >>>> + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret); >>>> + goto out; >>>> + } >>>> +#endif >>>> ret = iommu_add_device(pdev); >>> >>> Hmm, am I misremembering that in the earlier patch you had #else to >>> invoke the alternative behavior? >> >> You are remembering correctly. v1 had an #else, v2 does not. >> >>> Now you end up calling both functions; >>> if that's indeed intended, >> >> Yes, this is intentional. >> >>> this may still want doing differently. >>> Looking at the earlier patch introducing the function, I can't infer >>> though whether that's intended: iommu_add_dt_pci_device() checks that >>> the add_device hook is present, but then I didn't find any use of this >>> hook. The revlog there suggests the check might be stale. >> >> Ah, right, the ops->add_device check is stale in the other patch. Good catch, I'll remove it there. >> >>> If indeed the function does only preparatory work, I don't see why it >>> would need naming "iommu_..."; I'd rather consider pci_add_dt_device() >>> then. >> >> The function has now been reduced to reading SMMU configuration data from DT and mapping RID/BDF -> AXI stream ID. However, it is still SMMU related, and it is still invoking another iommu_ops hook function, dt_xlate (which is yet another AXI stream ID translation, separate from what is being discussed here). Does this justify keeping "iommu_..." in the name? I'm not convinced pci_add_dt_device() is a good name for it either (more on this below). > > The function being SMMU-related pretty strongly suggests it wants to be > invoked via a hook. If the add_device() one isn't suitable, perhaps we > need a new (optional) prepare_device() one? With pci_add_device() then > calling iommu_prepare_device(), wrapping the hook invocation? > > But just to be clear: A new hook would need enough justification as to > the existing one being unsuitable. I'll move it to the add_device hook in v3
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c index b42acb8d7c09..6dbaae682773 100644 --- a/xen/drivers/passthrough/pci.c +++ b/xen/drivers/passthrough/pci.c @@ -34,6 +34,11 @@ #include <xen/vpci.h> #include <xen/msi.h> #include <xsm/xsm.h> + +#ifdef CONFIG_HAS_DEVICE_TREE +#include <asm/iommu_fwspec.h> +#endif + #include "ats.h" struct pci_seg { @@ -762,9 +767,20 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn, pdev->domain = NULL; goto out; } +#ifdef CONFIG_HAS_DEVICE_TREE + ret = iommu_add_dt_pci_device(pdev); + if ( ret < 0 ) + { + printk(XENLOG_ERR "pci-iommu translation failed: %d\n", ret); + goto out; + } +#endif ret = iommu_add_device(pdev); if ( ret ) { +#ifdef CONFIG_HAS_DEVICE_TREE + iommu_fwspec_free(pci_to_dev(pdev)); +#endif vpci_remove_device(pdev); list_del(&pdev->domain_list); pdev->domain = NULL;