Message ID | 20230501200305.168058-4-stewart.hildebrand@amd.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | SMMU handling for PCIe Passthrough on ARM | expand |
On 01.05.2023 22:03, Stewart Hildebrand wrote: > @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d); > * (IOMMU is not enabled/present or device is not connected to it). > */ > int iommu_add_dt_device(struct dt_device_node *np); > +#ifdef CONFIG_HAS_PCI > +int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev); > +#endif Why the first parameter? Doesn't the 2nd one describe the device in full? If this is about phantom devices, then I'd expect the function to take care of those (like iommu_add_device() does), rather than its caller. Jan
On 5/2/23 03:44, Jan Beulich wrote: > On 01.05.2023 22:03, Stewart Hildebrand wrote: >> @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d); >> * (IOMMU is not enabled/present or device is not connected to it). >> */ >> int iommu_add_dt_device(struct dt_device_node *np); >> +#ifdef CONFIG_HAS_PCI >> +int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev); >> +#endif > > Why the first parameter? Doesn't the 2nd one describe the device in full? It's related to phantom device/function handling, although this series unfortunately does not properly handle phantom devices. > If this is about phantom devices, then I'd expect the function to take > care of those (like iommu_add_device() does), rather than its caller. In the next patch ("[PATCH v1 4/6] pci/arm: Use iommu_add_dt_pci_device() instead of arch hook"), we will invoke iommu_add_dt_pci_device(devfn, pdev) from iommu_add_device(). Since iommu_add_device() iterates over the phantom functions, it would be redundant to also have such a loop inside of iommu_add_dt_pci_device(). If we are to properly handle phantom devices on ARM, the SMMU drivers (smmu.c/smmu-v3.c) would need some more work. In patches 5/6 and 6/6 in this series, we have: if ( devfn != pdev->devfn ) return -EOPNOTSUPP; Also, the ARM SMMU drivers in Xen currently only support a single AXI stream ID per device, so some development would need to occur in order to support phantom devices. Should phantom device support be part of this series, or would it be acceptable to introduce phantom device support on ARM as part of a future series? Lastly, I'd like to check my understanding since phantom devices are new to me. Here's my understanding: A phantom device is a device that advertises itself as single function, but actually has multiple phantom functions. These phantom functions will have unique requestor IDs (RID). The RID is essentially the BDF. To use a phantom device with Xen, we specify the pci-phantom command line option, and we identify phantom devices/functions in code by devfn != pdev->devfn. On ARM, we need to map/translate a BDF to an AXI stream ID in order for the SMMU to identify the device and apply translation. That BDF -> stream ID mapping is defined by the iommu-map/iommu-map-mask property in the device tree [1]. The BDF -> AXI stream ID mapping in DT could allow phantom devices (i.e. devices with phantom functions) to use different stream IDs based on the (phantom) function. So, in theory, on ARM, there is a possibility we may have a device that advertises itself as single function, but will issue AXI transactions with multiple different AXI stream IDs due to phantom functions. In this case, we will want each AXI stream ID to be programmed into the SMMU to avoid SMMU faults. Please correct me if I've misunderstood anything. [1] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
On 04.05.2023 23:54, Stewart Hildebrand wrote: > On 5/2/23 03:44, Jan Beulich wrote: >> On 01.05.2023 22:03, Stewart Hildebrand wrote: >>> @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d); >>> * (IOMMU is not enabled/present or device is not connected to it). >>> */ >>> int iommu_add_dt_device(struct dt_device_node *np); >>> +#ifdef CONFIG_HAS_PCI >>> +int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev); >>> +#endif >> >> Why the first parameter? Doesn't the 2nd one describe the device in full? > > It's related to phantom device/function handling, although this series unfortunately does not properly handle phantom devices. > >> If this is about phantom devices, then I'd expect the function to take >> care of those (like iommu_add_device() does), rather than its caller. > > In the next patch ("[PATCH v1 4/6] pci/arm: Use iommu_add_dt_pci_device() instead of arch hook"), we will invoke iommu_add_dt_pci_device(devfn, pdev) from iommu_add_device(). Which I think I said there already I consider wrong. > Since iommu_add_device() iterates over the phantom functions, it would be redundant to also have such a loop inside of iommu_add_dt_pci_device(). > > If we are to properly handle phantom devices on ARM, the SMMU drivers (smmu.c/smmu-v3.c) would need some more work. In patches 5/6 and 6/6 in this series, we have: > > if ( devfn != pdev->devfn ) > return -EOPNOTSUPP; > > Also, the ARM SMMU drivers in Xen currently only support a single AXI stream ID per device, so some development would need to occur in order to support phantom devices. > > Should phantom device support be part of this series, or would it be acceptable to introduce phantom device support on ARM as part of a future series? I wouldn't view this as a strict requirement, so long as it is made clear in respective patch descriptions. > Lastly, I'd like to check my understanding since phantom devices are new to me. Here's my understanding: > > A phantom device is a device that advertises itself as single function, but actually has multiple phantom functions. These phantom functions will have unique requestor IDs (RID). The RID is essentially the BDF. To use a phantom device with Xen, we specify the pci-phantom command line option, and we identify phantom devices/functions in code by devfn != pdev->devfn. The command line option is there only to work around errata, i.e. devices behaving as if they had phantom functions without advertising themselves as such. See our use of PCI_EXP_DEVCAP_PHANTOM. As you can see, this being PCIe only, and legacy PCI device bevaing this way would require use of the command line option. > On ARM, we need to map/translate a BDF to an AXI stream ID in order for the SMMU to identify the device and apply translation. That BDF -> stream ID mapping is defined by the iommu-map/iommu-map-mask property in the device tree [1]. The BDF -> AXI stream ID mapping in DT could allow phantom devices (i.e. devices with phantom functions) to use different stream IDs based on the (phantom) function. > > So, in theory, on ARM, there is a possibility we may have a device that advertises itself as single function, but will issue AXI transactions with multiple different AXI stream IDs due to phantom functions. In this case, we will want each AXI stream ID to be programmed into the SMMU to avoid SMMU faults. Right, which of course first requires that you know the mapping between these IDs. Jan > Please correct me if I've misunderstood anything. > > [1] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
On 5/5/23 02:20, Jan Beulich wrote: > On 04.05.2023 23:54, Stewart Hildebrand wrote: >> On 5/2/23 03:44, Jan Beulich wrote: >>> On 01.05.2023 22:03, Stewart Hildebrand wrote: >>>> @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d); >>>> * (IOMMU is not enabled/present or device is not connected to it). >>>> */ >>>> int iommu_add_dt_device(struct dt_device_node *np); >>>> +#ifdef CONFIG_HAS_PCI >>>> +int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev); >>>> +#endif >>> >>> Why the first parameter? Doesn't the 2nd one describe the device in full? >> >> It's related to phantom device/function handling, although this series unfortunately does not properly handle phantom devices. >> >>> If this is about phantom devices, then I'd expect the function to take >>> care of those (like iommu_add_device() does), rather than its caller. >> >> In the next patch ("[PATCH v1 4/6] pci/arm: Use iommu_add_dt_pci_device() instead of arch hook"), we will invoke iommu_add_dt_pci_device(devfn, pdev) from iommu_add_device(). > > Which I think I said there already I consider wrong. Okay, I'm following now. Right, so, the first parameter is not needed. >> Since iommu_add_device() iterates over the phantom functions, it would be redundant to also have such a loop inside of iommu_add_dt_pci_device(). >> >> If we are to properly handle phantom devices on ARM, the SMMU drivers (smmu.c/smmu-v3.c) would need some more work. In patches 5/6 and 6/6 in this series, we have: >> >> if ( devfn != pdev->devfn ) >> return -EOPNOTSUPP; >> >> Also, the ARM SMMU drivers in Xen currently only support a single AXI stream ID per device, so some development would need to occur in order to support phantom devices. I spoke too soon, the SMMU drivers do appear to support multiple IDs. >> >> Should phantom device support be part of this series, or would it be acceptable to introduce phantom device support on ARM as part of a future series? > > I wouldn't view this as a strict requirement, so long as it is made clear in > respective patch descriptions. After doing a little more investigation, I should be able to add in the plumbing to iterate over the phantom functions in iommu_add_dt_pci_device(). Stay tuned for v2 of this series. >> Lastly, I'd like to check my understanding since phantom devices are new to me. Here's my understanding: >> >> A phantom device is a device that advertises itself as single function, but actually has multiple phantom functions. These phantom functions will have unique requestor IDs (RID). The RID is essentially the BDF. To use a phantom device with Xen, we specify the pci-phantom command line option, and we identify phantom devices/functions in code by devfn != pdev->devfn. > > The command line option is there only to work around errata, i.e. devices > behaving as if they had phantom functions without advertising themselves > as such. See our use of PCI_EXP_DEVCAP_PHANTOM. As you can see, this being > PCIe only, and legacy PCI device bevaing this way would require use of the > command line option. > >> On ARM, we need to map/translate a BDF to an AXI stream ID in order for the SMMU to identify the device and apply translation. That BDF -> stream ID mapping is defined by the iommu-map/iommu-map-mask property in the device tree [1]. The BDF -> AXI stream ID mapping in DT could allow phantom devices (i.e. devices with phantom functions) to use different stream IDs based on the (phantom) function. >> >> So, in theory, on ARM, there is a possibility we may have a device that advertises itself as single function, but will issue AXI transactions with multiple different AXI stream IDs due to phantom functions. In this case, we will want each AXI stream ID to be programmed into the SMMU to avoid SMMU faults. > > Right, which of course first requires that you know the mapping between > these IDs. Thanks for clarifying! Stewart
diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c index 1b50f4670944..ef98f343eef2 100644 --- a/xen/drivers/passthrough/device_tree.c +++ b/xen/drivers/passthrough/device_tree.c @@ -151,6 +151,151 @@ static int iommu_dt_xlate(struct device *dev, return ops->dt_xlate(dev, iommu_spec); } +#ifdef CONFIG_HAS_PCI +int iommu_dt_pci_map_id(const struct dt_device_node *np, uint32_t id, + const char *map_name, const char *map_mask_name, + struct dt_device_node **target, uint32_t *id_out) +{ + uint32_t map_mask, masked_id, map_len; + const __be32 *map = NULL; + + if ( !np || !map_name || (!target && !id_out) ) + return -EINVAL; + + map = dt_get_property(np, map_name, &map_len); + if ( !map ) + { + if ( target ) + return -ENODEV; + /* Otherwise, no map implies no translation */ + *id_out = id; + return 0; + } + + if ( !map_len || map_len % (4 * sizeof(*map)) ) + { + printk(XENLOG_ERR "%pOF: Error: Bad %s length: %d\n", np, + map_name, map_len); + return -EINVAL; + } + + /* The default is to select all bits. */ + map_mask = 0xffffffff; + + /* + * Can be overridden by "{iommu,msi}-map-mask" property. + * If of_property_read_u32() fails, the default is used. + */ + if ( map_mask_name ) + dt_property_read_u32(np, map_mask_name, &map_mask); + + masked_id = map_mask & id; + for ( ; (int)map_len > 0; map_len -= 4 * sizeof(*map), map += 4 ) + { + struct dt_device_node *phandle_node; + uint32_t id_base = be32_to_cpup(map + 0); + uint32_t phandle = be32_to_cpup(map + 1); + uint32_t out_base = be32_to_cpup(map + 2); + uint32_t id_len = be32_to_cpup(map + 3); + + if ( id_base & ~map_mask ) + { + printk(XENLOG_ERR "%pOF: Invalid %s translation - %s-mask (0x%x) ignores id-base (0x%x)\n", + np, map_name, map_name, map_mask, id_base); + return -EFAULT; + } + + if ( masked_id < id_base || masked_id >= id_base + id_len ) + continue; + + phandle_node = dt_find_node_by_phandle(phandle); + if ( !phandle_node ) + return -ENODEV; + + if ( target ) + { + if ( !*target ) + *target = phandle_node; + + if ( *target != phandle_node ) + continue; + } + + if ( id_out ) + *id_out = masked_id - id_base + out_base; + + printk(XENLOG_DEBUG "%pOF: %s, using mask %08x, id-base: %08x, out-base: %08x, length: %08x, id: %08x -> %08x\n", + np, map_name, map_mask, id_base, out_base, id_len, id, + masked_id - id_base + out_base); + return 0; + } + + printk(XENLOG_ERR "%pOF: no %s translation for id 0x%x on %pOF\n", + np, map_name, id, target && *target ? *target : NULL); + + /* + * NOTE: Linux bypasses translation without returning an error here, + * but should we behave in the same way on Xen? Restrict for now. + */ + return -EFAULT; +} + +int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev) +{ + const struct iommu_ops *ops = iommu_get_ops(); + struct dt_phandle_args iommu_spec = { .args_count = 1 }; + struct device *dev = pci_to_dev(pdev); + const struct dt_device_node *np; + int rc = NO_IOMMU; + + if ( !iommu_enabled ) + return NO_IOMMU; + + if ( !ops ) + return -EINVAL; + + if ( device_is_protected(dev) ) + return 0; + + if ( dev_iommu_fwspec_get(dev) ) + return -EEXIST; + + np = pci_find_host_bridge_node(pdev); + if ( !np ) + return -ENODEV; + + /* + * According to the Documentation/devicetree/bindings/pci/pci-iommu.txt + * from Linux. + */ + rc = iommu_dt_pci_map_id(np, PCI_BDF(pdev->bus, devfn), "iommu-map", + "iommu-map-mask", &iommu_spec.np, iommu_spec.args); + if ( rc ) + return rc == -ENODEV ? NO_IOMMU : rc; + + /* + * The driver which supports generic PCI-IOMMU DT bindings must have + * these callback implemented. + */ + if ( !ops->add_device || !ops->dt_xlate ) + return -EINVAL; + + rc = iommu_dt_xlate(dev, &iommu_spec); + + /* + * Add master device to the IOMMU if latter is present and available. + * The driver is responsible to mark that device as protected. + */ + if ( !rc ) + rc = ops->add_device(devfn, dev); + + if ( rc < 0 ) + iommu_fwspec_free(dev); + + return rc; +} +#endif /* CONFIG_HAS_PCI */ + int iommu_add_dt_device(struct dt_device_node *np) { const struct iommu_ops *ops = iommu_get_ops(); diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h index c1e4751a581f..dc40fdfb9231 100644 --- a/xen/include/xen/device_tree.h +++ b/xen/include/xen/device_tree.h @@ -852,6 +852,31 @@ int dt_count_phandle_with_args(const struct dt_device_node *np, */ int dt_get_pci_domain_nr(struct dt_device_node *node); +#ifdef CONFIG_HAS_PCI +/** + * iommu_dt_pci_map_id - Translate an ID through a downstream mapping. + * @np: root complex device node. + * @id: device ID to map. + * @map_name: property name of the map to use. + * @map_mask_name: optional property name of the mask to use. + * @target: optional pointer to a target device node. + * @id_out: optional pointer to receive the translated ID. + * + * Given a device ID, look up the appropriate implementation-defined + * platform ID and/or the target device which receives transactions on that + * ID, as per the "iommu-map" and "msi-map" bindings. Either of @target or + * @id_out may be NULL if only the other is required. If @target points to + * a non-NULL device node pointer, only entries targeting that node will be + * matched; if it points to a NULL value, it will receive the device node of + * the first matching target phandle, with a reference held. + * + * Return: 0 on success or a standard error code on failure. + */ +int iommu_dt_pci_map_id(const struct dt_device_node *np, uint32_t id, + const char *map_name, const char *map_mask_name, + struct dt_device_node **target, uint32_t *id_out); +#endif /* CONFIG_HAS_PCI */ + struct dt_device_node *dt_find_node_by_phandle(dt_phandle handle); #ifdef CONFIG_DEVICE_TREE_DEBUG diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h index 405db59971c5..d1b91ec13056 100644 --- a/xen/include/xen/iommu.h +++ b/xen/include/xen/iommu.h @@ -219,7 +219,8 @@ int iommu_dt_domain_init(struct domain *d); int iommu_release_dt_devices(struct domain *d); /* - * Helper to add master device to the IOMMU using generic IOMMU DT bindings. + * Helpers to add master device to the IOMMU using generic (PCI-)IOMMU + * DT bindings. * * Return values: * 0 : device is protected by an IOMMU @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d); * (IOMMU is not enabled/present or device is not connected to it). */ int iommu_add_dt_device(struct dt_device_node *np); +#ifdef CONFIG_HAS_PCI +int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev); +#endif int iommu_do_dt_domctl(struct xen_domctl *, struct domain *, XEN_GUEST_HANDLE_PARAM(xen_domctl_t));