diff mbox series

[v1,3/6] iommu/arm: Introduce iommu_add_dt_pci_device API

Message ID 20230501200305.168058-4-stewart.hildebrand@amd.com (mailing list archive)
State Superseded
Headers show
Series SMMU handling for PCIe Passthrough on ARM | expand

Commit Message

Stewart Hildebrand May 1, 2023, 8:03 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

The main purpose of this patch is to add a way to register PCI device
(which is behind the IOMMU) using the generic PCI-IOMMU DT bindings [1]
before assigning that device to a domain.

This behaves in almost the same way as existing iommu_add_dt_device API,
the difference is in devices to handle and DT bindings to use.

The function of_map_id to translate an ID through a downstream mapping
(which is also suitable for mapping Requester ID) was borrowed from Linux
(v5.10-rc6) and updated according to the Xen code base.

XXX: I don't port pci_for_each_dma_alias from Linux which is a part
of PCI-IOMMU bindings infrastucture as I don't have a good understanding
for how it is expected to work in Xen environment.
Also it is not completely clear whether we need to distinguish between
different PCI types here (DEV_TYPE_PCI, DEV_TYPE_PCI_HOST_BRIDGE, etc).
For example, how we should behave here if the host bridge doesn't have
a stream ID (so not described in iommu-map property) just simple
fail or bypasses translation?

[1] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
downstream->v1:
* rebase
* add const qualifier to struct dt_device_node *np arg in dt_map_id()
* add const qualifier to struct dt_device_node *np declaration in iommu_add_pci_device()
* use stdint.h types instead of u8/u32/etc...
* rename functions:
  s/dt_iommu_xlate/iommu_dt_xlate/
  s/dt_map_id/iommu_dt_pci_map_id/
  s/iommu_add_pci_device/iommu_add_dt_pci_device/
* add device_is_protected check in iommu_add_dt_pci_device
* wrap prototypes in CONFIG_HAS_PCI

(cherry picked from commit 734e3bf6ee77e7947667ab8fa96c25b349c2e1da from
 the downstream branch poc/pci-passthrough from
 https://gitlab.com/xen-project/people/bmarquis/xen-arm-poc.git)
---
 xen/drivers/passthrough/device_tree.c | 145 ++++++++++++++++++++++++++
 xen/include/xen/device_tree.h         |  25 +++++
 xen/include/xen/iommu.h               |   6 +-
 3 files changed, 175 insertions(+), 1 deletion(-)

Comments

Jan Beulich May 2, 2023, 7:44 a.m. UTC | #1
On 01.05.2023 22:03, Stewart Hildebrand wrote:
> @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d);
>   *      (IOMMU is not enabled/present or device is not connected to it).
>   */
>  int iommu_add_dt_device(struct dt_device_node *np);
> +#ifdef CONFIG_HAS_PCI
> +int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev);
> +#endif

Why the first parameter? Doesn't the 2nd one describe the device in full?
If this is about phantom devices, then I'd expect the function to take
care of those (like iommu_add_device() does), rather than its caller.

Jan
Stewart Hildebrand May 4, 2023, 9:54 p.m. UTC | #2
On 5/2/23 03:44, Jan Beulich wrote:
> On 01.05.2023 22:03, Stewart Hildebrand wrote:
>> @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d);
>>   *      (IOMMU is not enabled/present or device is not connected to it).
>>   */
>>  int iommu_add_dt_device(struct dt_device_node *np);
>> +#ifdef CONFIG_HAS_PCI
>> +int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev);
>> +#endif
> 
> Why the first parameter? Doesn't the 2nd one describe the device in full?

It's related to phantom device/function handling, although this series unfortunately does not properly handle phantom devices.

> If this is about phantom devices, then I'd expect the function to take
> care of those (like iommu_add_device() does), rather than its caller.

In the next patch ("[PATCH v1 4/6] pci/arm: Use iommu_add_dt_pci_device() instead of arch hook"), we will invoke iommu_add_dt_pci_device(devfn, pdev) from iommu_add_device(). Since iommu_add_device() iterates over the phantom functions, it would be redundant to also have such a loop inside of iommu_add_dt_pci_device().

If we are to properly handle phantom devices on ARM, the SMMU drivers (smmu.c/smmu-v3.c) would need some more work. In patches 5/6 and 6/6 in this series, we have:

if ( devfn != pdev->devfn )
    return -EOPNOTSUPP;

Also, the ARM SMMU drivers in Xen currently only support a single AXI stream ID per device, so some development would need to occur in order to support phantom devices.

Should phantom device support be part of this series, or would it be acceptable to introduce phantom device support on ARM as part of a future series?


Lastly, I'd like to check my understanding since phantom devices are new to me. Here's my understanding:

A phantom device is a device that advertises itself as single function, but actually has multiple phantom functions. These phantom functions will have unique requestor IDs (RID). The RID is essentially the BDF. To use a phantom device with Xen, we specify the pci-phantom command line option, and we identify phantom devices/functions in code by devfn != pdev->devfn.

On ARM, we need to map/translate a BDF to an AXI stream ID in order for the SMMU to identify the device and apply translation. That BDF -> stream ID mapping is defined by the iommu-map/iommu-map-mask property in the device tree [1]. The BDF -> AXI stream ID mapping in DT could allow phantom devices (i.e. devices with phantom functions) to use different stream IDs based on the (phantom) function.

So, in theory, on ARM, there is a possibility we may have a device that advertises itself as single function, but will issue AXI transactions with multiple different AXI stream IDs due to phantom functions. In this case, we will want each AXI stream ID to be programmed into the SMMU to avoid SMMU faults.

Please correct me if I've misunderstood anything.

[1] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
Jan Beulich May 5, 2023, 6:20 a.m. UTC | #3
On 04.05.2023 23:54, Stewart Hildebrand wrote:
> On 5/2/23 03:44, Jan Beulich wrote:
>> On 01.05.2023 22:03, Stewart Hildebrand wrote:
>>> @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d);
>>>   *      (IOMMU is not enabled/present or device is not connected to it).
>>>   */
>>>  int iommu_add_dt_device(struct dt_device_node *np);
>>> +#ifdef CONFIG_HAS_PCI
>>> +int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev);
>>> +#endif
>>
>> Why the first parameter? Doesn't the 2nd one describe the device in full?
> 
> It's related to phantom device/function handling, although this series unfortunately does not properly handle phantom devices.
> 
>> If this is about phantom devices, then I'd expect the function to take
>> care of those (like iommu_add_device() does), rather than its caller.
> 
> In the next patch ("[PATCH v1 4/6] pci/arm: Use iommu_add_dt_pci_device() instead of arch hook"), we will invoke iommu_add_dt_pci_device(devfn, pdev) from iommu_add_device().

Which I think I said there already I consider wrong.

> Since iommu_add_device() iterates over the phantom functions, it would be redundant to also have such a loop inside of iommu_add_dt_pci_device().
> 
> If we are to properly handle phantom devices on ARM, the SMMU drivers (smmu.c/smmu-v3.c) would need some more work. In patches 5/6 and 6/6 in this series, we have:
> 
> if ( devfn != pdev->devfn )
>     return -EOPNOTSUPP;
> 
> Also, the ARM SMMU drivers in Xen currently only support a single AXI stream ID per device, so some development would need to occur in order to support phantom devices.
> 
> Should phantom device support be part of this series, or would it be acceptable to introduce phantom device support on ARM as part of a future series?

I wouldn't view this as a strict requirement, so long as it is made clear in
respective patch descriptions.

> Lastly, I'd like to check my understanding since phantom devices are new to me. Here's my understanding:
> 
> A phantom device is a device that advertises itself as single function, but actually has multiple phantom functions. These phantom functions will have unique requestor IDs (RID). The RID is essentially the BDF. To use a phantom device with Xen, we specify the pci-phantom command line option, and we identify phantom devices/functions in code by devfn != pdev->devfn.

The command line option is there only to work around errata, i.e. devices
behaving as if they had phantom functions without advertising themselves
as such. See our use of PCI_EXP_DEVCAP_PHANTOM. As you can see, this being
PCIe only, and legacy PCI device bevaing this way would require use of the
command line option.

> On ARM, we need to map/translate a BDF to an AXI stream ID in order for the SMMU to identify the device and apply translation. That BDF -> stream ID mapping is defined by the iommu-map/iommu-map-mask property in the device tree [1]. The BDF -> AXI stream ID mapping in DT could allow phantom devices (i.e. devices with phantom functions) to use different stream IDs based on the (phantom) function.
> 
> So, in theory, on ARM, there is a possibility we may have a device that advertises itself as single function, but will issue AXI transactions with multiple different AXI stream IDs due to phantom functions. In this case, we will want each AXI stream ID to be programmed into the SMMU to avoid SMMU faults.

Right, which of course first requires that you know the mapping between
these IDs.

Jan

> Please correct me if I've misunderstood anything.
> 
> [1] https://www.kernel.org/doc/Documentation/devicetree/bindings/pci/pci-iommu.txt
Stewart Hildebrand May 8, 2023, 2:12 p.m. UTC | #4
On 5/5/23 02:20, Jan Beulich wrote:
> On 04.05.2023 23:54, Stewart Hildebrand wrote:
>> On 5/2/23 03:44, Jan Beulich wrote:
>>> On 01.05.2023 22:03, Stewart Hildebrand wrote:
>>>> @@ -228,6 +229,9 @@ int iommu_release_dt_devices(struct domain *d);
>>>>   *      (IOMMU is not enabled/present or device is not connected to it).
>>>>   */
>>>>  int iommu_add_dt_device(struct dt_device_node *np);
>>>> +#ifdef CONFIG_HAS_PCI
>>>> +int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev);
>>>> +#endif
>>>
>>> Why the first parameter? Doesn't the 2nd one describe the device in full?
>>
>> It's related to phantom device/function handling, although this series unfortunately does not properly handle phantom devices.
>>
>>> If this is about phantom devices, then I'd expect the function to take
>>> care of those (like iommu_add_device() does), rather than its caller.
>>
>> In the next patch ("[PATCH v1 4/6] pci/arm: Use iommu_add_dt_pci_device() instead of arch hook"), we will invoke iommu_add_dt_pci_device(devfn, pdev) from iommu_add_device().
> 
> Which I think I said there already I consider wrong.

Okay, I'm following now. Right, so, the first parameter is not needed.

>> Since iommu_add_device() iterates over the phantom functions, it would be redundant to also have such a loop inside of iommu_add_dt_pci_device().
>>
>> If we are to properly handle phantom devices on ARM, the SMMU drivers (smmu.c/smmu-v3.c) would need some more work. In patches 5/6 and 6/6 in this series, we have:
>>
>> if ( devfn != pdev->devfn )
>>     return -EOPNOTSUPP;
>>
>> Also, the ARM SMMU drivers in Xen currently only support a single AXI stream ID per device, so some development would need to occur in order to support phantom devices.

I spoke too soon, the SMMU drivers do appear to support multiple IDs.

>>
>> Should phantom device support be part of this series, or would it be acceptable to introduce phantom device support on ARM as part of a future series?
> 
> I wouldn't view this as a strict requirement, so long as it is made clear in
> respective patch descriptions.

After doing a little more investigation, I should be able to add in the plumbing to iterate over the phantom functions in iommu_add_dt_pci_device(). Stay tuned for v2 of this series.

>> Lastly, I'd like to check my understanding since phantom devices are new to me. Here's my understanding:
>>
>> A phantom device is a device that advertises itself as single function, but actually has multiple phantom functions. These phantom functions will have unique requestor IDs (RID). The RID is essentially the BDF. To use a phantom device with Xen, we specify the pci-phantom command line option, and we identify phantom devices/functions in code by devfn != pdev->devfn.
> 
> The command line option is there only to work around errata, i.e. devices
> behaving as if they had phantom functions without advertising themselves
> as such. See our use of PCI_EXP_DEVCAP_PHANTOM. As you can see, this being
> PCIe only, and legacy PCI device bevaing this way would require use of the
> command line option.
> 
>> On ARM, we need to map/translate a BDF to an AXI stream ID in order for the SMMU to identify the device and apply translation. That BDF -> stream ID mapping is defined by the iommu-map/iommu-map-mask property in the device tree [1]. The BDF -> AXI stream ID mapping in DT could allow phantom devices (i.e. devices with phantom functions) to use different stream IDs based on the (phantom) function.
>>
>> So, in theory, on ARM, there is a possibility we may have a device that advertises itself as single function, but will issue AXI transactions with multiple different AXI stream IDs due to phantom functions. In this case, we will want each AXI stream ID to be programmed into the SMMU to avoid SMMU faults.
> 
> Right, which of course first requires that you know the mapping between
> these IDs.

Thanks for clarifying!

Stewart
diff mbox series

Patch

diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c
index 1b50f4670944..ef98f343eef2 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -151,6 +151,151 @@  static int iommu_dt_xlate(struct device *dev,
     return ops->dt_xlate(dev, iommu_spec);
 }
 
+#ifdef CONFIG_HAS_PCI
+int iommu_dt_pci_map_id(const struct dt_device_node *np, uint32_t id,
+                        const char *map_name, const char *map_mask_name,
+                        struct dt_device_node **target, uint32_t *id_out)
+{
+    uint32_t map_mask, masked_id, map_len;
+    const __be32 *map = NULL;
+
+    if ( !np || !map_name || (!target && !id_out) )
+        return -EINVAL;
+
+    map = dt_get_property(np, map_name, &map_len);
+    if ( !map )
+    {
+        if ( target )
+            return -ENODEV;
+        /* Otherwise, no map implies no translation */
+        *id_out = id;
+        return 0;
+    }
+
+    if ( !map_len || map_len % (4 * sizeof(*map)) )
+    {
+        printk(XENLOG_ERR "%pOF: Error: Bad %s length: %d\n", np,
+            map_name, map_len);
+        return -EINVAL;
+    }
+
+    /* The default is to select all bits. */
+    map_mask = 0xffffffff;
+
+    /*
+     * Can be overridden by "{iommu,msi}-map-mask" property.
+     * If of_property_read_u32() fails, the default is used.
+     */
+    if ( map_mask_name )
+        dt_property_read_u32(np, map_mask_name, &map_mask);
+
+    masked_id = map_mask & id;
+    for ( ; (int)map_len > 0; map_len -= 4 * sizeof(*map), map += 4 )
+    {
+        struct dt_device_node *phandle_node;
+        uint32_t id_base = be32_to_cpup(map + 0);
+        uint32_t phandle = be32_to_cpup(map + 1);
+        uint32_t out_base = be32_to_cpup(map + 2);
+        uint32_t id_len = be32_to_cpup(map + 3);
+
+        if ( id_base & ~map_mask )
+        {
+            printk(XENLOG_ERR "%pOF: Invalid %s translation - %s-mask (0x%x) ignores id-base (0x%x)\n",
+                   np, map_name, map_name, map_mask, id_base);
+            return -EFAULT;
+        }
+
+        if ( masked_id < id_base || masked_id >= id_base + id_len )
+            continue;
+
+        phandle_node = dt_find_node_by_phandle(phandle);
+        if ( !phandle_node )
+            return -ENODEV;
+
+        if ( target )
+        {
+            if ( !*target )
+                *target = phandle_node;
+
+            if ( *target != phandle_node )
+                continue;
+        }
+
+        if ( id_out )
+            *id_out = masked_id - id_base + out_base;
+
+        printk(XENLOG_DEBUG "%pOF: %s, using mask %08x, id-base: %08x, out-base: %08x, length: %08x, id: %08x -> %08x\n",
+               np, map_name, map_mask, id_base, out_base, id_len, id,
+               masked_id - id_base + out_base);
+        return 0;
+    }
+
+    printk(XENLOG_ERR "%pOF: no %s translation for id 0x%x on %pOF\n",
+           np, map_name, id, target && *target ? *target : NULL);
+
+    /*
+     * NOTE: Linux bypasses translation without returning an error here,
+     * but should we behave in the same way on Xen? Restrict for now.
+     */
+    return -EFAULT;
+}
+
+int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev)
+{
+    const struct iommu_ops *ops = iommu_get_ops();
+    struct dt_phandle_args iommu_spec = { .args_count = 1 };
+    struct device *dev = pci_to_dev(pdev);
+    const struct dt_device_node *np;
+    int rc = NO_IOMMU;
+
+    if ( !iommu_enabled )
+        return NO_IOMMU;
+
+    if ( !ops )
+        return -EINVAL;
+
+    if ( device_is_protected(dev) )
+        return 0;
+
+    if ( dev_iommu_fwspec_get(dev) )
+        return -EEXIST;
+
+    np = pci_find_host_bridge_node(pdev);
+    if ( !np )
+        return -ENODEV;
+
+    /*
+     * According to the Documentation/devicetree/bindings/pci/pci-iommu.txt
+     * from Linux.
+     */
+    rc = iommu_dt_pci_map_id(np, PCI_BDF(pdev->bus, devfn), "iommu-map",
+                             "iommu-map-mask", &iommu_spec.np, iommu_spec.args);
+    if ( rc )
+        return rc == -ENODEV ? NO_IOMMU : rc;
+
+    /*
+     * The driver which supports generic PCI-IOMMU DT bindings must have
+     * these callback implemented.
+     */
+    if ( !ops->add_device || !ops->dt_xlate )
+        return -EINVAL;
+
+    rc = iommu_dt_xlate(dev, &iommu_spec);
+
+    /*
+     * Add master device to the IOMMU if latter is present and available.
+     * The driver is responsible to mark that device as protected.
+     */
+    if ( !rc )
+        rc = ops->add_device(devfn, dev);
+
+    if ( rc < 0 )
+        iommu_fwspec_free(dev);
+
+    return rc;
+}
+#endif /* CONFIG_HAS_PCI */
+
 int iommu_add_dt_device(struct dt_device_node *np)
 {
     const struct iommu_ops *ops = iommu_get_ops();
diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h
index c1e4751a581f..dc40fdfb9231 100644
--- a/xen/include/xen/device_tree.h
+++ b/xen/include/xen/device_tree.h
@@ -852,6 +852,31 @@  int dt_count_phandle_with_args(const struct dt_device_node *np,
  */
 int dt_get_pci_domain_nr(struct dt_device_node *node);
 
+#ifdef CONFIG_HAS_PCI
+/**
+ * iommu_dt_pci_map_id - Translate an ID through a downstream mapping.
+ * @np: root complex device node.
+ * @id: device ID to map.
+ * @map_name: property name of the map to use.
+ * @map_mask_name: optional property name of the mask to use.
+ * @target: optional pointer to a target device node.
+ * @id_out: optional pointer to receive the translated ID.
+ *
+ * Given a device ID, look up the appropriate implementation-defined
+ * platform ID and/or the target device which receives transactions on that
+ * ID, as per the "iommu-map" and "msi-map" bindings. Either of @target or
+ * @id_out may be NULL if only the other is required. If @target points to
+ * a non-NULL device node pointer, only entries targeting that node will be
+ * matched; if it points to a NULL value, it will receive the device node of
+ * the first matching target phandle, with a reference held.
+ *
+ * Return: 0 on success or a standard error code on failure.
+ */
+int iommu_dt_pci_map_id(const struct dt_device_node *np, uint32_t id,
+                        const char *map_name, const char *map_mask_name,
+                        struct dt_device_node **target, uint32_t *id_out);
+#endif /* CONFIG_HAS_PCI */
+
 struct dt_device_node *dt_find_node_by_phandle(dt_phandle handle);
 
 #ifdef CONFIG_DEVICE_TREE_DEBUG
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index 405db59971c5..d1b91ec13056 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -219,7 +219,8 @@  int iommu_dt_domain_init(struct domain *d);
 int iommu_release_dt_devices(struct domain *d);
 
 /*
- * Helper to add master device to the IOMMU using generic IOMMU DT bindings.
+ * Helpers to add master device to the IOMMU using generic (PCI-)IOMMU
+ * DT bindings.
  *
  * Return values:
  *  0 : device is protected by an IOMMU
@@ -228,6 +229,9 @@  int iommu_release_dt_devices(struct domain *d);
  *      (IOMMU is not enabled/present or device is not connected to it).
  */
 int iommu_add_dt_device(struct dt_device_node *np);
+#ifdef CONFIG_HAS_PCI
+int iommu_add_dt_pci_device(uint8_t devfn, struct pci_dev *pdev);
+#endif
 
 int iommu_do_dt_domctl(struct xen_domctl *, struct domain *,
                        XEN_GUEST_HANDLE_PARAM(xen_domctl_t));