diff mbox series

[QEMU,v8] xen/passthrough: use gsi to map pirq when dom0 is PVH

Message ID 20241016062827.2301004-1-Jiqian.Chen@amd.com (mailing list archive)
State Superseded
Headers show
Series [QEMU,v8] xen/passthrough: use gsi to map pirq when dom0 is PVH | expand

Commit Message

Chen, Jiqian Oct. 16, 2024, 6:28 a.m. UTC
In PVH dom0, when passthrough a device to domU, QEMU code
xen_pt_realize->xc_physdev_map_pirq wants to use gsi, but in current codes
the gsi number is got from file /sys/bus/pci/devices/<sbdf>/irq, that is
wrong, because irq is not equal with gsi, they are in different spaces, so
pirq mapping fails.

To solve above problem, use new interface of Xen, xc_pcidev_get_gsi to get
gsi and use xc_physdev_map_pirq_gsi to map pirq when dom0 is PVH.

Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
---
Hi All,
This is v8 to support passthrough on Xen when dom0 is PVH.
v7->v8 change:
* Since xc_physdev_gsi_from_dev was renamed to xc_pcidev_get_gsi, changed it.
* Added xen_run_qemu_on_hvm to check if Qemu run on PV dom0, if not use xc_physdev_map_pirq_gsi to map pirq.
* Used CONFIG_XEN_CTRL_INTERFACE_VERSION to wrap the new part for compatibility.
* Added "#define DOMID_RUN_QEMU 0" to represent the id of domain that Qemu run on.


Best regards,
Jiqian Chen



v6->v7 changes:
* Because the function of obtaining gsi was changed on the kernel and Xen side. Changed to use
  xc_physdev_gsi_from_dev, that requires passing in sbdf instead of irq.

v5->v6 changes:
* Because the function of obtaining gsi was changed on the kernel and Xen side. Changed to use
  xc_physdev_gsi_from_irq, instead of gsi sysfs.
* Since function changed, removed the Review-by of Stefano.

v4->v5 changes:
* Added Review-by Stefano.

v3->v4 changes:
* Added gsi into struct XenHostPCIDevice and used gsi number that read from gsi sysfs
  if it exists, if there is no gsi sysfs, still use irq.

v2->v3 changes:
* Due to changes in the implementation of the second patch on kernel side(that adds
  a new sysfs for gsi instead of a new syscall), so read gsi number from the sysfs of gsi.

v1 and v2:
We can record the relation between gsi and irq, then when userspace(qemu) want
to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall
to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() when dom0
initialize pci devices, and provide a syscall for userspace to get the gsi from irq. The
third patch of xen(tools: Add new function to get gsi from irq) add a new function
xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success.

Issues we encountered:
1. failed to map pirq for gsi
Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device's gsi to pirq in
function xen_pt_realize(). But failed.

Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq,
but qemu pass irq to it and treat irq as gsi, it is got from file
/sys/bus/pci/devices/xxxx:xx:xx.x/irq in function xen_host_pci_device_get(). But actually
the gsi number is not equal with irq. They are in different space.
---
 hw/xen/xen_pt.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
 hw/xen/xen_pt.h |  1 +
 2 files changed, 45 insertions(+)

Comments

Stewart Hildebrand Oct. 21, 2024, 8:55 p.m. UTC | #1
On 10/16/24 02:28, Jiqian Chen wrote:
> In PVH dom0, when passthrough a device to domU, QEMU code
> xen_pt_realize->xc_physdev_map_pirq wants to use gsi, but in current codes
> the gsi number is got from file /sys/bus/pci/devices/<sbdf>/irq, that is
> wrong, because irq is not equal with gsi, they are in different spaces, so
> pirq mapping fails.
> 
> To solve above problem, use new interface of Xen, xc_pcidev_get_gsi to get
> gsi and use xc_physdev_map_pirq_gsi to map pirq when dom0 is PVH.
> 
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
> ---
> Hi All,
> This is v8 to support passthrough on Xen when dom0 is PVH.
> v7->v8 change:
> * Since xc_physdev_gsi_from_dev was renamed to xc_pcidev_get_gsi, changed it.
> * Added xen_run_qemu_on_hvm to check if Qemu run on PV dom0, if not use xc_physdev_map_pirq_gsi to map pirq.
> * Used CONFIG_XEN_CTRL_INTERFACE_VERSION to wrap the new part for compatibility.
> * Added "#define DOMID_RUN_QEMU 0" to represent the id of domain that Qemu run on.
> 
> 
> Best regards,
> Jiqian Chen
> 
> 
> 
> v6->v7 changes:
> * Because the function of obtaining gsi was changed on the kernel and Xen side. Changed to use
>   xc_physdev_gsi_from_dev, that requires passing in sbdf instead of irq.
> 
> v5->v6 changes:
> * Because the function of obtaining gsi was changed on the kernel and Xen side. Changed to use
>   xc_physdev_gsi_from_irq, instead of gsi sysfs.
> * Since function changed, removed the Review-by of Stefano.
> 
> v4->v5 changes:
> * Added Review-by Stefano.
> 
> v3->v4 changes:
> * Added gsi into struct XenHostPCIDevice and used gsi number that read from gsi sysfs
>   if it exists, if there is no gsi sysfs, still use irq.
> 
> v2->v3 changes:
> * Due to changes in the implementation of the second patch on kernel side(that adds
>   a new sysfs for gsi instead of a new syscall), so read gsi number from the sysfs of gsi.
> 
> v1 and v2:
> We can record the relation between gsi and irq, then when userspace(qemu) want
> to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall
> to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() when dom0
> initialize pci devices, and provide a syscall for userspace to get the gsi from irq. The
> third patch of xen(tools: Add new function to get gsi from irq) add a new function
> xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
> And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success.
> 
> Issues we encountered:
> 1. failed to map pirq for gsi
> Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device's gsi to pirq in
> function xen_pt_realize(). But failed.
> 
> Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq,
> but qemu pass irq to it and treat irq as gsi, it is got from file
> /sys/bus/pci/devices/xxxx:xx:xx.x/irq in function xen_host_pci_device_get(). But actually
> the gsi number is not equal with irq. They are in different space.
> ---
>  hw/xen/xen_pt.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>  hw/xen/xen_pt.h |  1 +
>  2 files changed, 45 insertions(+)
> 
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index 3635d1b39f79..7f8139d20915 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -766,6 +766,41 @@ static void xen_pt_destroy(PCIDevice *d) {
>  }
>  /* init */
>  
> +#define PCI_SBDF(seg, bus, dev, func) \
> +            ((((uint32_t)(seg)) << 16) | \
> +            (PCI_BUILD_BDF(bus, PCI_DEVFN(dev, func))))

Nit: This macro looks generic and useful. Would it be better defined in
include/hw/pci/pci.h?

> +
> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 42000
> +static bool xen_run_qemu_on_hvm(void)

This function name seems to imply "is qemu running on HVM?", but I think
the question we're really trying to answer is whether the pcidev needs
a GSI mapped. How about calling the function "xen_pt_needs_gsi" or
similar?

> +{
> +    xc_domaininfo_t info;
> +
> +    if (!xc_domain_getinfo_single(xen_xc, DOMID_RUN_QEMU, &info) &&
> +        (info.flags & XEN_DOMINF_hvm_guest)) {

I think reading /sys/hypervisor/guest_type would allow you to get the
same information without another hypercall.

> +        return true;
> +    }
> +
> +    return false;
> +}
> +
> +static int xen_map_pirq_for_gsi(PCIDevice *d, int *pirq)

Nit: s/xen_/xen_pt_/

> +{
> +    int gsi;
> +    XenPCIPassthroughState *s = XEN_PT_DEVICE(d);
> +
> +    gsi = xc_pcidev_get_gsi(xen_xc,
> +                            PCI_SBDF(s->real_device.domain,
> +                                     s->real_device.bus,
> +                                     s->real_device.dev,
> +                                     s->real_device.func));
> +    if (gsi >= 0) {
> +        return xc_physdev_map_pirq_gsi(xen_xc, xen_domid, gsi, pirq);
> +    }
> +
> +    return gsi;
> +}
> +#endif
> +
>  static void xen_pt_realize(PCIDevice *d, Error **errp)
>  {
>      ERRP_GUARD();
> @@ -847,7 +882,16 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
>          goto out;
>      }
>  
> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 42000
> +    if (xen_run_qemu_on_hvm()) {
> +        rc = xen_map_pirq_for_gsi(d, &pirq);
> +    } else {
> +        rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
> +    }
> +#else
>      rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
> +#endif
> +
>      if (rc < 0) {
>          XEN_PT_ERR(d, "Mapping machine irq %u to pirq %i failed, (err: %d)\n",
>                     machine_irq, pirq, errno);
> diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
> index 095a0f0365d4..a08b45b7fbae 100644
> --- a/hw/xen/xen_pt.h
> +++ b/hw/xen/xen_pt.h
> @@ -36,6 +36,7 @@ void xen_pt_log(const PCIDevice *d, const char *f, ...) G_GNUC_PRINTF(2, 3);
>  #  define XEN_PT_LOG_CONFIG(d, addr, val, len)
>  #endif
>  
> +#define DOMID_RUN_QEMU 0
>  
>  /* Helper */
>  #define XEN_PFN(x) ((x) >> XC_PAGE_SHIFT)
Marek Marczykowski-Górecki Oct. 22, 2024, 11:07 p.m. UTC | #2
On Wed, Oct 16, 2024 at 02:28:27PM +0800, Jiqian Chen wrote:
> --- a/hw/xen/xen_pt.h
> +++ b/hw/xen/xen_pt.h
> @@ -36,6 +36,7 @@ void xen_pt_log(const PCIDevice *d, const char *f, ...) G_GNUC_PRINTF(2, 3);
>  #  define XEN_PT_LOG_CONFIG(d, addr, val, len)
>  #endif
>  
> +#define DOMID_RUN_QEMU 0

Please, don't hardcode dom0 here, QEMU can be running in a stubdomain
too (in which case, this will hilariously explode, as it will check what
dom0 is, instead of where QEMU is running). 
Stewart already provided an alternative.
Chen, Jiqian Oct. 23, 2024, 9:29 a.m. UTC | #3
On 2024/10/22 04:55, Stewart Hildebrand wrote:
> On 10/16/24 02:28, Jiqian Chen wrote:
>> In PVH dom0, when passthrough a device to domU, QEMU code
>> xen_pt_realize->xc_physdev_map_pirq wants to use gsi, but in current codes
>> the gsi number is got from file /sys/bus/pci/devices/<sbdf>/irq, that is
>> wrong, because irq is not equal with gsi, they are in different spaces, so
>> pirq mapping fails.
>>
>> To solve above problem, use new interface of Xen, xc_pcidev_get_gsi to get
>> gsi and use xc_physdev_map_pirq_gsi to map pirq when dom0 is PVH.
>>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> Signed-off-by: Huang Rui <ray.huang@amd.com>
>> Signed-off-by: Jiqian Chen <Jiqian.Chen@amd.com>
>> ---
>> Hi All,
>> This is v8 to support passthrough on Xen when dom0 is PVH.
>> v7->v8 change:
>> * Since xc_physdev_gsi_from_dev was renamed to xc_pcidev_get_gsi, changed it.
>> * Added xen_run_qemu_on_hvm to check if Qemu run on PV dom0, if not use xc_physdev_map_pirq_gsi to map pirq.
>> * Used CONFIG_XEN_CTRL_INTERFACE_VERSION to wrap the new part for compatibility.
>> * Added "#define DOMID_RUN_QEMU 0" to represent the id of domain that Qemu run on.
>>
>>
>> Best regards,
>> Jiqian Chen
>>
>>
>>
>> v6->v7 changes:
>> * Because the function of obtaining gsi was changed on the kernel and Xen side. Changed to use
>>   xc_physdev_gsi_from_dev, that requires passing in sbdf instead of irq.
>>
>> v5->v6 changes:
>> * Because the function of obtaining gsi was changed on the kernel and Xen side. Changed to use
>>   xc_physdev_gsi_from_irq, instead of gsi sysfs.
>> * Since function changed, removed the Review-by of Stefano.
>>
>> v4->v5 changes:
>> * Added Review-by Stefano.
>>
>> v3->v4 changes:
>> * Added gsi into struct XenHostPCIDevice and used gsi number that read from gsi sysfs
>>   if it exists, if there is no gsi sysfs, still use irq.
>>
>> v2->v3 changes:
>> * Due to changes in the implementation of the second patch on kernel side(that adds
>>   a new sysfs for gsi instead of a new syscall), so read gsi number from the sysfs of gsi.
>>
>> v1 and v2:
>> We can record the relation between gsi and irq, then when userspace(qemu) want
>> to use gsi, we can do a translation. The third patch of kernel(xen/privcmd: Add new syscall
>> to get gsi from irq) records all the relations in acpi_register_gsi_xen_pvh() when dom0
>> initialize pci devices, and provide a syscall for userspace to get the gsi from irq. The
>> third patch of xen(tools: Add new function to get gsi from irq) add a new function
>> xc_physdev_gsi_from_irq() to call the new syscall added on kernel side.
>> And then userspace can use that function to get gsi. Then xc_physdev_map_pirq() will success.
>>
>> Issues we encountered:
>> 1. failed to map pirq for gsi
>> Problem: qemu will call xc_physdev_map_pirq() to map a passthrough device's gsi to pirq in
>> function xen_pt_realize(). But failed.
>>
>> Reason: According to the implement of xc_physdev_map_pirq(), it needs gsi instead of irq,
>> but qemu pass irq to it and treat irq as gsi, it is got from file
>> /sys/bus/pci/devices/xxxx:xx:xx.x/irq in function xen_host_pci_device_get(). But actually
>> the gsi number is not equal with irq. They are in different space.
>> ---
>>  hw/xen/xen_pt.c | 44 ++++++++++++++++++++++++++++++++++++++++++++
>>  hw/xen/xen_pt.h |  1 +
>>  2 files changed, 45 insertions(+)
>>
>> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
>> index 3635d1b39f79..7f8139d20915 100644
>> --- a/hw/xen/xen_pt.c
>> +++ b/hw/xen/xen_pt.c
>> @@ -766,6 +766,41 @@ static void xen_pt_destroy(PCIDevice *d) {
>>  }
>>  /* init */
>>  
>> +#define PCI_SBDF(seg, bus, dev, func) \
>> +            ((((uint32_t)(seg)) << 16) | \
>> +            (PCI_BUILD_BDF(bus, PCI_DEVFN(dev, func))))
> 
> Nit: This macro looks generic and useful. Would it be better defined in
> include/hw/pci/pci.h?
> 
>> +
>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 42000
>> +static bool xen_run_qemu_on_hvm(void)
> 
> This function name seems to imply "is qemu running on HVM?", but I think
> the question we're really trying to answer is whether the pcidev needs
> a GSI mapped. How about calling the function "xen_pt_needs_gsi" or
> similar?
> 
>> +{
>> +    xc_domaininfo_t info;
>> +
>> +    if (!xc_domain_getinfo_single(xen_xc, DOMID_RUN_QEMU, &info) &&
>> +        (info.flags & XEN_DOMINF_hvm_guest)) {
> 
> I think reading /sys/hypervisor/guest_type would allow you to get the
> same information without another hypercall.
> 
>> +        return true;
>> +    }
>> +
>> +    return false;
>> +}
>> +
>> +static int xen_map_pirq_for_gsi(PCIDevice *d, int *pirq)
> 
> Nit: s/xen_/xen_pt_/
> 
>> +{
>> +    int gsi;
>> +    XenPCIPassthroughState *s = XEN_PT_DEVICE(d);
>> +
>> +    gsi = xc_pcidev_get_gsi(xen_xc,
>> +                            PCI_SBDF(s->real_device.domain,
>> +                                     s->real_device.bus,
>> +                                     s->real_device.dev,
>> +                                     s->real_device.func));
>> +    if (gsi >= 0) {
>> +        return xc_physdev_map_pirq_gsi(xen_xc, xen_domid, gsi, pirq);
>> +    }
>> +
>> +    return gsi;
>> +}
>> +#endif
>> +
>>  static void xen_pt_realize(PCIDevice *d, Error **errp)
>>  {
>>      ERRP_GUARD();
>> @@ -847,7 +882,16 @@ static void xen_pt_realize(PCIDevice *d, Error **errp)
>>          goto out;
>>      }
>>  
>> +#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 42000
>> +    if (xen_run_qemu_on_hvm()) {
>> +        rc = xen_map_pirq_for_gsi(d, &pirq);
>> +    } else {
>> +        rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
>> +    }
>> +#else
>>      rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
>> +#endif
>> +
>>      if (rc < 0) {
>>          XEN_PT_ERR(d, "Mapping machine irq %u to pirq %i failed, (err: %d)\n",
>>                     machine_irq, pirq, errno);
>> diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
>> index 095a0f0365d4..a08b45b7fbae 100644
>> --- a/hw/xen/xen_pt.h
>> +++ b/hw/xen/xen_pt.h
>> @@ -36,6 +36,7 @@ void xen_pt_log(const PCIDevice *d, const char *f, ...) G_GNUC_PRINTF(2, 3);
>>  #  define XEN_PT_LOG_CONFIG(d, addr, val, len)
>>  #endif
>>  
>> +#define DOMID_RUN_QEMU 0
>>  
>>  /* Helper */
>>  #define XEN_PFN(x) ((x) >> XC_PAGE_SHIFT)
> 
What you said makes more sense. I will change this patch according to your comments. Thank you very much!
Chen, Jiqian Oct. 23, 2024, 9:30 a.m. UTC | #4
On 2024/10/23 07:07, Marek Marczykowski-Górecki wrote:
> On Wed, Oct 16, 2024 at 02:28:27PM +0800, Jiqian Chen wrote:
>> --- a/hw/xen/xen_pt.h
>> +++ b/hw/xen/xen_pt.h
>> @@ -36,6 +36,7 @@ void xen_pt_log(const PCIDevice *d, const char *f, ...) G_GNUC_PRINTF(2, 3);
>>  #  define XEN_PT_LOG_CONFIG(d, addr, val, len)
>>  #endif
>>  
>> +#define DOMID_RUN_QEMU 0
> 
> Please, don't hardcode dom0 here, QEMU can be running in a stubdomain
> too (in which case, this will hilariously explode, as it will check what
> dom0 is, instead of where QEMU is running). 
> Stewart already provided an alternative.
Got it, will use Stewart's suggestion, thanks.

>
diff mbox series

Patch

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index 3635d1b39f79..7f8139d20915 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -766,6 +766,41 @@  static void xen_pt_destroy(PCIDevice *d) {
 }
 /* init */
 
+#define PCI_SBDF(seg, bus, dev, func) \
+            ((((uint32_t)(seg)) << 16) | \
+            (PCI_BUILD_BDF(bus, PCI_DEVFN(dev, func))))
+
+#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 42000
+static bool xen_run_qemu_on_hvm(void)
+{
+    xc_domaininfo_t info;
+
+    if (!xc_domain_getinfo_single(xen_xc, DOMID_RUN_QEMU, &info) &&
+        (info.flags & XEN_DOMINF_hvm_guest)) {
+        return true;
+    }
+
+    return false;
+}
+
+static int xen_map_pirq_for_gsi(PCIDevice *d, int *pirq)
+{
+    int gsi;
+    XenPCIPassthroughState *s = XEN_PT_DEVICE(d);
+
+    gsi = xc_pcidev_get_gsi(xen_xc,
+                            PCI_SBDF(s->real_device.domain,
+                                     s->real_device.bus,
+                                     s->real_device.dev,
+                                     s->real_device.func));
+    if (gsi >= 0) {
+        return xc_physdev_map_pirq_gsi(xen_xc, xen_domid, gsi, pirq);
+    }
+
+    return gsi;
+}
+#endif
+
 static void xen_pt_realize(PCIDevice *d, Error **errp)
 {
     ERRP_GUARD();
@@ -847,7 +882,16 @@  static void xen_pt_realize(PCIDevice *d, Error **errp)
         goto out;
     }
 
+#if CONFIG_XEN_CTRL_INTERFACE_VERSION >= 42000
+    if (xen_run_qemu_on_hvm()) {
+        rc = xen_map_pirq_for_gsi(d, &pirq);
+    } else {
+        rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
+    }
+#else
     rc = xc_physdev_map_pirq(xen_xc, xen_domid, machine_irq, &pirq);
+#endif
+
     if (rc < 0) {
         XEN_PT_ERR(d, "Mapping machine irq %u to pirq %i failed, (err: %d)\n",
                    machine_irq, pirq, errno);
diff --git a/hw/xen/xen_pt.h b/hw/xen/xen_pt.h
index 095a0f0365d4..a08b45b7fbae 100644
--- a/hw/xen/xen_pt.h
+++ b/hw/xen/xen_pt.h
@@ -36,6 +36,7 @@  void xen_pt_log(const PCIDevice *d, const char *f, ...) G_GNUC_PRINTF(2, 3);
 #  define XEN_PT_LOG_CONFIG(d, addr, val, len)
 #endif
 
+#define DOMID_RUN_QEMU 0
 
 /* Helper */
 #define XEN_PFN(x) ((x) >> XC_PAGE_SHIFT)