Message ID | 20230312075455.450187-7-ray.huang@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 | expand |
On 12.03.2023 08:54, Huang Rui wrote: > From: Chen Jiqian <Jiqian.Chen@amd.com> > > Use new xc_physdev_gsi_from_irq to get the GSI number Apart from again the "Why?", ... > --- a/tools/libs/light/libxl_pci.c > +++ b/tools/libs/light/libxl_pci.c > @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > goto out_no_irq; > } > if ((fscanf(f, "%u", &irq) == 1) && irq) { > + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq); > if (r < 0) { > LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)", ... aren't you breaking existing use cases this way? Jan
On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > From: Chen Jiqian <Jiqian.Chen@amd.com> > > Use new xc_physdev_gsi_from_irq to get the GSI number > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > Signed-off-by: Huang Rui <ray.huang@amd.com> > --- > tools/libs/light/libxl_pci.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > index f4c4f17545..47cf2799bf 100644 > --- a/tools/libs/light/libxl_pci.c > +++ b/tools/libs/light/libxl_pci.c > @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > goto out_no_irq; > } > if ((fscanf(f, "%u", &irq) == 1) && irq) { > + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); This is just a shot in the dark, because I don't really have enough context to understand what's going on here, but see below. I've taken a look at this on my box, and it seems like on dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not very consistent. If devices are in use by a driver the irq sysfs node reports either the GSI irq or the MSI IRQ (in case a single MSI interrupt is setup). It seems like pciback in Linux does something to report the correct value: root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq 74 root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq 16 As you can see, making the device assignable changed the value reported by the irq node to be the GSI instead of the MSI IRQ, I would think you are missing something similar in the PVH setup (some pciback magic)? Albeit I have no idea why you would need to translate from IRQ to GSI in the way you do in this and related patches, because I'm missing the context. Regards, Roger.
On Wed, 15 Mar 2023, Roger Pau Monné wrote: > On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > > From: Chen Jiqian <Jiqian.Chen@amd.com> > > > > Use new xc_physdev_gsi_from_irq to get the GSI number > > > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > > Signed-off-by: Huang Rui <ray.huang@amd.com> > > --- > > tools/libs/light/libxl_pci.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > > index f4c4f17545..47cf2799bf 100644 > > --- a/tools/libs/light/libxl_pci.c > > +++ b/tools/libs/light/libxl_pci.c > > @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > > goto out_no_irq; > > } > > if ((fscanf(f, "%u", &irq) == 1) && irq) { > > + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > > This is just a shot in the dark, because I don't really have enough > context to understand what's going on here, but see below. > > I've taken a look at this on my box, and it seems like on > dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not > very consistent. > > If devices are in use by a driver the irq sysfs node reports either > the GSI irq or the MSI IRQ (in case a single MSI interrupt is > setup). > > It seems like pciback in Linux does something to report the correct > value: > > root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > 74 > root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 > root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > 16 > > As you can see, making the device assignable changed the value > reported by the irq node to be the GSI instead of the MSI IRQ, I would > think you are missing something similar in the PVH setup (some pciback > magic)? > > Albeit I have no idea why you would need to translate from IRQ to GSI > in the way you do in this and related patches, because I'm missing the > context. As I mention in another email, also keep in mind that we need QEMU to work and QEMU calls: 1) xc_physdev_map_pirq (this is also called from libxl) 2) xc_domain_bind_pt_pci_irq In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not the IRQ. If you look at the implementation of xc_physdev_map_pirq, you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: if ( index < 0 || index >= nr_irqs_gsi ) { dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, index); return -EINVAL; } nr_irqs_gsi < 112, and the check will fail. So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need to discover the GSI number corresponding to the IRQ number.
On Wed, Mar 15, 2023 at 05:44:12PM -0700, Stefano Stabellini wrote: > On Wed, 15 Mar 2023, Roger Pau Monné wrote: > > On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > > > From: Chen Jiqian <Jiqian.Chen@amd.com> > > > > > > Use new xc_physdev_gsi_from_irq to get the GSI number > > > > > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > > > Signed-off-by: Huang Rui <ray.huang@amd.com> > > > --- > > > tools/libs/light/libxl_pci.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > > > index f4c4f17545..47cf2799bf 100644 > > > --- a/tools/libs/light/libxl_pci.c > > > +++ b/tools/libs/light/libxl_pci.c > > > @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > > > goto out_no_irq; > > > } > > > if ((fscanf(f, "%u", &irq) == 1) && irq) { > > > + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > > > > This is just a shot in the dark, because I don't really have enough > > context to understand what's going on here, but see below. > > > > I've taken a look at this on my box, and it seems like on > > dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not > > very consistent. > > > > If devices are in use by a driver the irq sysfs node reports either > > the GSI irq or the MSI IRQ (in case a single MSI interrupt is > > setup). > > > > It seems like pciback in Linux does something to report the correct > > value: > > > > root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > > 74 > > root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 > > root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > > 16 > > > > As you can see, making the device assignable changed the value > > reported by the irq node to be the GSI instead of the MSI IRQ, I would > > think you are missing something similar in the PVH setup (some pciback > > magic)? > > > > Albeit I have no idea why you would need to translate from IRQ to GSI > > in the way you do in this and related patches, because I'm missing the > > context. > > As I mention in another email, also keep in mind that we need QEMU to > work and QEMU calls: > 1) xc_physdev_map_pirq (this is also called from libxl) > 2) xc_domain_bind_pt_pci_irq Those would be fine, and don't need any translation since it's QEMU the one that creates and maps the MSI(-X) interrupts, so it knows the PIRQ without requiring any translation because it has been allocated by QEMU itself. GSI is kind of special because it's a fixed (legacy) interrupt mapped to an IO-APIC pin and assigned to the device by the firmware. The setup in that case gets done by the toolstack (libxl) because the mapping is immutable for the lifetime of the domain. > In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ > in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not > the IRQ. I think the real question here is why on this scenario IRQ != GSI for GSI interrupts. On one of my systems when booted as PVH dom0 with pci=nomsi I get from /proc/interrupt: 8: 0 0 0 0 0 0 0 IO-APIC 8-edge rtc0 9: 1 0 0 0 0 0 0 IO-APIC 9-fasteoi acpi 16: 0 0 8373 0 0 0 0 IO-APIC 16-fasteoi i801_smbus, xhci-hcd:usb1, ahci[0000:00:17.0] 17: 0 0 0 542 0 0 0 IO-APIC 17-fasteoi eth0 24: 4112 0 0 0 0 0 0 xen-percpu -virq timer0 25: 352 0 0 0 0 0 0 xen-percpu -ipi resched0 26: 6635 0 0 0 0 0 0 xen-percpu -ipi callfunc0 So GSI == IRQ, and non GSI interrupts start past the last GSI, which is 23 on this system because it has a single IO-APIC with 24 pins. We need to figure out what causes GSIs to be mapped to IRQs != GSI on your system, and then we can decide how to fix this. I would expect it could be fixed so that IRQ == GSI (like it's on PV dom0), and none of this translation to be necessary. Can you paste the output of /proc/interrupts on that system that has a GSI not identity mapped to an IRQ? > If you look at the implementation of xc_physdev_map_pirq, > you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen > xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: > > if ( index < 0 || index >= nr_irqs_gsi ) > { > dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, > index); > return -EINVAL; > } > > nr_irqs_gsi < 112, and the check will fail. > > So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need > to discover the GSI number corresponding to the IRQ number. Right, see above, I think the real problem is that IRQ != GSI on your Linux dom0 for some reason. Thanks, Roger.
On 16.03.2023 01:44, Stefano Stabellini wrote: > On Wed, 15 Mar 2023, Roger Pau Monné wrote: >> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: >>> From: Chen Jiqian <Jiqian.Chen@amd.com> >>> >>> Use new xc_physdev_gsi_from_irq to get the GSI number >>> >>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> >>> Signed-off-by: Huang Rui <ray.huang@amd.com> >>> --- >>> tools/libs/light/libxl_pci.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c >>> index f4c4f17545..47cf2799bf 100644 >>> --- a/tools/libs/light/libxl_pci.c >>> +++ b/tools/libs/light/libxl_pci.c >>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, >>> goto out_no_irq; >>> } >>> if ((fscanf(f, "%u", &irq) == 1) && irq) { >>> + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); >> >> This is just a shot in the dark, because I don't really have enough >> context to understand what's going on here, but see below. >> >> I've taken a look at this on my box, and it seems like on >> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not >> very consistent. >> >> If devices are in use by a driver the irq sysfs node reports either >> the GSI irq or the MSI IRQ (in case a single MSI interrupt is >> setup). >> >> It seems like pciback in Linux does something to report the correct >> value: >> >> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq >> 74 >> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 >> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq >> 16 >> >> As you can see, making the device assignable changed the value >> reported by the irq node to be the GSI instead of the MSI IRQ, I would >> think you are missing something similar in the PVH setup (some pciback >> magic)? >> >> Albeit I have no idea why you would need to translate from IRQ to GSI >> in the way you do in this and related patches, because I'm missing the >> context. > > As I mention in another email, also keep in mind that we need QEMU to > work and QEMU calls: > 1) xc_physdev_map_pirq (this is also called from libxl) > 2) xc_domain_bind_pt_pci_irq > > > In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ > in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not > the IRQ. If you look at the implementation of xc_physdev_map_pirq, > you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen > xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: > > if ( index < 0 || index >= nr_irqs_gsi ) > { > dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, > index); > return -EINVAL; > } > > nr_irqs_gsi < 112, and the check will fail. > > So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need > to discover the GSI number corresponding to the IRQ number. That's one possible approach. Another could be (making a lot of assumptions) that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen then translates that to GSI, knowing that PVH doesn't have (host) GSIs exposed to it. Jan
On Thu, Mar 16, 2023 at 09:55:03AM +0100, Jan Beulich wrote: > On 16.03.2023 01:44, Stefano Stabellini wrote: > > On Wed, 15 Mar 2023, Roger Pau Monné wrote: > >> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > >>> From: Chen Jiqian <Jiqian.Chen@amd.com> > >>> > >>> Use new xc_physdev_gsi_from_irq to get the GSI number > >>> > >>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > >>> Signed-off-by: Huang Rui <ray.huang@amd.com> > >>> --- > >>> tools/libs/light/libxl_pci.c | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > >>> index f4c4f17545..47cf2799bf 100644 > >>> --- a/tools/libs/light/libxl_pci.c > >>> +++ b/tools/libs/light/libxl_pci.c > >>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > >>> goto out_no_irq; > >>> } > >>> if ((fscanf(f, "%u", &irq) == 1) && irq) { > >>> + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > >> > >> This is just a shot in the dark, because I don't really have enough > >> context to understand what's going on here, but see below. > >> > >> I've taken a look at this on my box, and it seems like on > >> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not > >> very consistent. > >> > >> If devices are in use by a driver the irq sysfs node reports either > >> the GSI irq or the MSI IRQ (in case a single MSI interrupt is > >> setup). > >> > >> It seems like pciback in Linux does something to report the correct > >> value: > >> > >> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > >> 74 > >> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 > >> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > >> 16 > >> > >> As you can see, making the device assignable changed the value > >> reported by the irq node to be the GSI instead of the MSI IRQ, I would > >> think you are missing something similar in the PVH setup (some pciback > >> magic)? > >> > >> Albeit I have no idea why you would need to translate from IRQ to GSI > >> in the way you do in this and related patches, because I'm missing the > >> context. > > > > As I mention in another email, also keep in mind that we need QEMU to > > work and QEMU calls: > > 1) xc_physdev_map_pirq (this is also called from libxl) > > 2) xc_domain_bind_pt_pci_irq > > > > > > In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ > > in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not > > the IRQ. If you look at the implementation of xc_physdev_map_pirq, > > you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen > > xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: > > > > if ( index < 0 || index >= nr_irqs_gsi ) > > { > > dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, > > index); > > return -EINVAL; > > } > > > > nr_irqs_gsi < 112, and the check will fail. > > > > So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need > > to discover the GSI number corresponding to the IRQ number. > > That's one possible approach. Another could be (making a lot of assumptions) > that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen > then translates that to GSI, knowing that PVH doesn't have (host) GSIs > exposed to it. I don't think Xen can translate a Linux IRQ to a GSI, as that's a Linux abstraction Xen has no part in. The GSIs exposed to a PVH dom0 are the native (host) ones, as we create an emulated IO-APIC topology that mimics the physical one. Question here is why Linux ends up with a IRQ != GSI, as it's my understanding on Linux GSIs will always be identity mapped to IRQs, and the IRQ space up to the last possible GSI is explicitly reserved for this purpose. Thanks, Roger.
On 16.03.2023 10:27, Roger Pau Monné wrote: > On Thu, Mar 16, 2023 at 09:55:03AM +0100, Jan Beulich wrote: >> On 16.03.2023 01:44, Stefano Stabellini wrote: >>> On Wed, 15 Mar 2023, Roger Pau Monné wrote: >>>> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: >>>>> From: Chen Jiqian <Jiqian.Chen@amd.com> >>>>> >>>>> Use new xc_physdev_gsi_from_irq to get the GSI number >>>>> >>>>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> >>>>> Signed-off-by: Huang Rui <ray.huang@amd.com> >>>>> --- >>>>> tools/libs/light/libxl_pci.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c >>>>> index f4c4f17545..47cf2799bf 100644 >>>>> --- a/tools/libs/light/libxl_pci.c >>>>> +++ b/tools/libs/light/libxl_pci.c >>>>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, >>>>> goto out_no_irq; >>>>> } >>>>> if ((fscanf(f, "%u", &irq) == 1) && irq) { >>>>> + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); >>>> >>>> This is just a shot in the dark, because I don't really have enough >>>> context to understand what's going on here, but see below. >>>> >>>> I've taken a look at this on my box, and it seems like on >>>> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not >>>> very consistent. >>>> >>>> If devices are in use by a driver the irq sysfs node reports either >>>> the GSI irq or the MSI IRQ (in case a single MSI interrupt is >>>> setup). >>>> >>>> It seems like pciback in Linux does something to report the correct >>>> value: >>>> >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq >>>> 74 >>>> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq >>>> 16 >>>> >>>> As you can see, making the device assignable changed the value >>>> reported by the irq node to be the GSI instead of the MSI IRQ, I would >>>> think you are missing something similar in the PVH setup (some pciback >>>> magic)? >>>> >>>> Albeit I have no idea why you would need to translate from IRQ to GSI >>>> in the way you do in this and related patches, because I'm missing the >>>> context. >>> >>> As I mention in another email, also keep in mind that we need QEMU to >>> work and QEMU calls: >>> 1) xc_physdev_map_pirq (this is also called from libxl) >>> 2) xc_domain_bind_pt_pci_irq >>> >>> >>> In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ >>> in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not >>> the IRQ. If you look at the implementation of xc_physdev_map_pirq, >>> you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen >>> xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: >>> >>> if ( index < 0 || index >= nr_irqs_gsi ) >>> { >>> dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, >>> index); >>> return -EINVAL; >>> } >>> >>> nr_irqs_gsi < 112, and the check will fail. >>> >>> So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need >>> to discover the GSI number corresponding to the IRQ number. >> >> That's one possible approach. Another could be (making a lot of assumptions) >> that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen >> then translates that to GSI, knowing that PVH doesn't have (host) GSIs >> exposed to it. > > I don't think Xen can translate a Linux IRQ to a GSI, as that's a > Linux abstraction Xen has no part in. Well, I was talking about whatever Dom0 and Xen use to communicate. I.e. if at all I might have meant pIRQ, but now that you mention ... > The GSIs exposed to a PVH dom0 are the native (host) ones, as we > create an emulated IO-APIC topology that mimics the physical one. > > Question here is why Linux ends up with a IRQ != GSI, as it's my > understanding on Linux GSIs will always be identity mapped to IRQs, and > the IRQ space up to the last possible GSI is explicitly reserved for > this purpose. ... this I guess pIRQ was a PV-only concept, and it really ought to be GSI in the PVH case. So yes, it then all boils down to that Linux- internal question. Jan
On Thu, 16 Mar 2023, Jan Beulich wrote: > On 16.03.2023 10:27, Roger Pau Monné wrote: > > On Thu, Mar 16, 2023 at 09:55:03AM +0100, Jan Beulich wrote: > >> On 16.03.2023 01:44, Stefano Stabellini wrote: > >>> On Wed, 15 Mar 2023, Roger Pau Monné wrote: > >>>> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > >>>>> From: Chen Jiqian <Jiqian.Chen@amd.com> > >>>>> > >>>>> Use new xc_physdev_gsi_from_irq to get the GSI number > >>>>> > >>>>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > >>>>> Signed-off-by: Huang Rui <ray.huang@amd.com> > >>>>> --- > >>>>> tools/libs/light/libxl_pci.c | 1 + > >>>>> 1 file changed, 1 insertion(+) > >>>>> > >>>>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > >>>>> index f4c4f17545..47cf2799bf 100644 > >>>>> --- a/tools/libs/light/libxl_pci.c > >>>>> +++ b/tools/libs/light/libxl_pci.c > >>>>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > >>>>> goto out_no_irq; > >>>>> } > >>>>> if ((fscanf(f, "%u", &irq) == 1) && irq) { > >>>>> + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > >>>> > >>>> This is just a shot in the dark, because I don't really have enough > >>>> context to understand what's going on here, but see below. > >>>> > >>>> I've taken a look at this on my box, and it seems like on > >>>> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not > >>>> very consistent. > >>>> > >>>> If devices are in use by a driver the irq sysfs node reports either > >>>> the GSI irq or the MSI IRQ (in case a single MSI interrupt is > >>>> setup). > >>>> > >>>> It seems like pciback in Linux does something to report the correct > >>>> value: > >>>> > >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > >>>> 74 > >>>> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 > >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > >>>> 16 > >>>> > >>>> As you can see, making the device assignable changed the value > >>>> reported by the irq node to be the GSI instead of the MSI IRQ, I would > >>>> think you are missing something similar in the PVH setup (some pciback > >>>> magic)? > >>>> > >>>> Albeit I have no idea why you would need to translate from IRQ to GSI > >>>> in the way you do in this and related patches, because I'm missing the > >>>> context. > >>> > >>> As I mention in another email, also keep in mind that we need QEMU to > >>> work and QEMU calls: > >>> 1) xc_physdev_map_pirq (this is also called from libxl) > >>> 2) xc_domain_bind_pt_pci_irq > >>> > >>> > >>> In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ > >>> in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not > >>> the IRQ. If you look at the implementation of xc_physdev_map_pirq, > >>> you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen > >>> xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: > >>> > >>> if ( index < 0 || index >= nr_irqs_gsi ) > >>> { > >>> dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, > >>> index); > >>> return -EINVAL; > >>> } > >>> > >>> nr_irqs_gsi < 112, and the check will fail. > >>> > >>> So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need > >>> to discover the GSI number corresponding to the IRQ number. > >> > >> That's one possible approach. Another could be (making a lot of assumptions) > >> that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen > >> then translates that to GSI, knowing that PVH doesn't have (host) GSIs > >> exposed to it. > > > > I don't think Xen can translate a Linux IRQ to a GSI, as that's a > > Linux abstraction Xen has no part in. > > Well, I was talking about whatever Dom0 and Xen use to communicate. I.e. > if at all I might have meant pIRQ, but now that you mention ... > > > The GSIs exposed to a PVH dom0 are the native (host) ones, as we > > create an emulated IO-APIC topology that mimics the physical one. > > > > Question here is why Linux ends up with a IRQ != GSI, as it's my > > understanding on Linux GSIs will always be identity mapped to IRQs, and > > the IRQ space up to the last possible GSI is explicitly reserved for > > this purpose. > > ... this I guess pIRQ was a PV-only concept, and it really ought to be > GSI in the PVH case. So yes, it then all boils down to that Linux- > internal question. Excellent question but we'll have to wait for Ray as he is the one with access to the hardware. But I have this data I can share in the meantime: [ 1.260378] IRQ to pin mappings: [ 1.260387] IRQ1 -> 0:1 [ 1.260395] IRQ2 -> 0:2 [ 1.260403] IRQ3 -> 0:3 [ 1.260410] IRQ4 -> 0:4 [ 1.260418] IRQ5 -> 0:5 [ 1.260425] IRQ6 -> 0:6 [ 1.260432] IRQ7 -> 0:7 [ 1.260440] IRQ8 -> 0:8 [ 1.260447] IRQ9 -> 0:9 [ 1.260455] IRQ10 -> 0:10 [ 1.260462] IRQ11 -> 0:11 [ 1.260470] IRQ12 -> 0:12 [ 1.260478] IRQ13 -> 0:13 [ 1.260485] IRQ14 -> 0:14 [ 1.260493] IRQ15 -> 0:15 [ 1.260505] IRQ106 -> 1:8 [ 1.260513] IRQ112 -> 1:4 [ 1.260521] IRQ116 -> 1:13 [ 1.260529] IRQ117 -> 1:14 [ 1.260537] IRQ118 -> 1:15 [ 1.260544] .................................... done. And I think Ray traced the point in Linux where Linux gives us an IRQ == 112 (which is the one causing issues): __acpi_register_gsi-> acpi_register_gsi_ioapic-> mp_map_gsi_to_irq-> mp_map_pin_to_irq-> __irq_resolve_mapping() if (likely(data)) { desc = irq_data_to_desc(data); if (irq) *irq = data->irq; /* this IRQ is 112, IO-APIC-34 domain */ }
On 17.03.2023 00:19, Stefano Stabellini wrote: > On Thu, 16 Mar 2023, Jan Beulich wrote: >> So yes, it then all boils down to that Linux- >> internal question. > > Excellent question but we'll have to wait for Ray as he is the one with > access to the hardware. But I have this data I can share in the > meantime: > > [ 1.260378] IRQ to pin mappings: > [ 1.260387] IRQ1 -> 0:1 > [ 1.260395] IRQ2 -> 0:2 > [ 1.260403] IRQ3 -> 0:3 > [ 1.260410] IRQ4 -> 0:4 > [ 1.260418] IRQ5 -> 0:5 > [ 1.260425] IRQ6 -> 0:6 > [ 1.260432] IRQ7 -> 0:7 > [ 1.260440] IRQ8 -> 0:8 > [ 1.260447] IRQ9 -> 0:9 > [ 1.260455] IRQ10 -> 0:10 > [ 1.260462] IRQ11 -> 0:11 > [ 1.260470] IRQ12 -> 0:12 > [ 1.260478] IRQ13 -> 0:13 > [ 1.260485] IRQ14 -> 0:14 > [ 1.260493] IRQ15 -> 0:15 > [ 1.260505] IRQ106 -> 1:8 > [ 1.260513] IRQ112 -> 1:4 > [ 1.260521] IRQ116 -> 1:13 > [ 1.260529] IRQ117 -> 1:14 > [ 1.260537] IRQ118 -> 1:15 > [ 1.260544] .................................... done. And what does Linux think are IRQs 16 ... 105? Have you compared with Linux running baremetal on the same hardware? Jan > And I think Ray traced the point in Linux where Linux gives us an IRQ == > 112 (which is the one causing issues): > > __acpi_register_gsi-> > acpi_register_gsi_ioapic-> > mp_map_gsi_to_irq-> > mp_map_pin_to_irq-> > __irq_resolve_mapping() > > if (likely(data)) { > desc = irq_data_to_desc(data); > if (irq) > *irq = data->irq; > /* this IRQ is 112, IO-APIC-34 domain */ > }
On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > On 17.03.2023 00:19, Stefano Stabellini wrote: > > On Thu, 16 Mar 2023, Jan Beulich wrote: > >> So yes, it then all boils down to that Linux- > >> internal question. > > > > Excellent question but we'll have to wait for Ray as he is the one with > > access to the hardware. But I have this data I can share in the > > meantime: > > > > [ 1.260378] IRQ to pin mappings: > > [ 1.260387] IRQ1 -> 0:1 > > [ 1.260395] IRQ2 -> 0:2 > > [ 1.260403] IRQ3 -> 0:3 > > [ 1.260410] IRQ4 -> 0:4 > > [ 1.260418] IRQ5 -> 0:5 > > [ 1.260425] IRQ6 -> 0:6 > > [ 1.260432] IRQ7 -> 0:7 > > [ 1.260440] IRQ8 -> 0:8 > > [ 1.260447] IRQ9 -> 0:9 > > [ 1.260455] IRQ10 -> 0:10 > > [ 1.260462] IRQ11 -> 0:11 > > [ 1.260470] IRQ12 -> 0:12 > > [ 1.260478] IRQ13 -> 0:13 > > [ 1.260485] IRQ14 -> 0:14 > > [ 1.260493] IRQ15 -> 0:15 > > [ 1.260505] IRQ106 -> 1:8 > > [ 1.260513] IRQ112 -> 1:4 > > [ 1.260521] IRQ116 -> 1:13 > > [ 1.260529] IRQ117 -> 1:14 > > [ 1.260537] IRQ118 -> 1:15 > > [ 1.260544] .................................... done. > > And what does Linux think are IRQs 16 ... 105? Have you compared with > Linux running baremetal on the same hardware? So I have some emails from Ray from he time he was looking into this, and on Linux dom0 PVH dmesg there is: [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 So it seems the vIO-APIC data provided by Xen to dom0 is at least consistent. > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > 112 (which is the one causing issues): > > > > __acpi_register_gsi-> > > acpi_register_gsi_ioapic-> > > mp_map_gsi_to_irq-> > > mp_map_pin_to_irq-> > > __irq_resolve_mapping() > > > > if (likely(data)) { > > desc = irq_data_to_desc(data); > > if (irq) > > *irq = data->irq; > > /* this IRQ is 112, IO-APIC-34 domain */ > > } Could this all be a result of patch 4/5 in the Linux series ("[RFC PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different __acpi_register_gsi hook is installed for PVH in order to setup GSIs using PHYSDEV ops instead of doing it natively from the IO-APIC? FWIW, the introduced function in that patch (acpi_register_gsi_xen_pvh()) seems to unconditionally call acpi_register_gsi_ioapic() without checking if the GSI is already registered, which might lead to multiple IRQs being allocated for the same underlying GSI? As I commented there, I think that approach is wrong. If the GSI has not been mapped in Xen (because dom0 hasn't unmasked the respective IO-APIC pin) we should add some logic in the toolstack to map it before attempting to bind. Thanks, Roger.
On Fri, 17 Mar 2023, Roger Pau Monné wrote: > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > >> So yes, it then all boils down to that Linux- > > >> internal question. > > > > > > Excellent question but we'll have to wait for Ray as he is the one with > > > access to the hardware. But I have this data I can share in the > > > meantime: > > > > > > [ 1.260378] IRQ to pin mappings: > > > [ 1.260387] IRQ1 -> 0:1 > > > [ 1.260395] IRQ2 -> 0:2 > > > [ 1.260403] IRQ3 -> 0:3 > > > [ 1.260410] IRQ4 -> 0:4 > > > [ 1.260418] IRQ5 -> 0:5 > > > [ 1.260425] IRQ6 -> 0:6 > > > [ 1.260432] IRQ7 -> 0:7 > > > [ 1.260440] IRQ8 -> 0:8 > > > [ 1.260447] IRQ9 -> 0:9 > > > [ 1.260455] IRQ10 -> 0:10 > > > [ 1.260462] IRQ11 -> 0:11 > > > [ 1.260470] IRQ12 -> 0:12 > > > [ 1.260478] IRQ13 -> 0:13 > > > [ 1.260485] IRQ14 -> 0:14 > > > [ 1.260493] IRQ15 -> 0:15 > > > [ 1.260505] IRQ106 -> 1:8 > > > [ 1.260513] IRQ112 -> 1:4 > > > [ 1.260521] IRQ116 -> 1:13 > > > [ 1.260529] IRQ117 -> 1:14 > > > [ 1.260537] IRQ118 -> 1:15 > > > [ 1.260544] .................................... done. > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > Linux running baremetal on the same hardware? > > So I have some emails from Ray from he time he was looking into this, > and on Linux dom0 PVH dmesg there is: > > [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 > [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > consistent. > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > > 112 (which is the one causing issues): > > > > > > __acpi_register_gsi-> > > > acpi_register_gsi_ioapic-> > > > mp_map_gsi_to_irq-> > > > mp_map_pin_to_irq-> > > > __irq_resolve_mapping() > > > > > > if (likely(data)) { > > > desc = irq_data_to_desc(data); > > > if (irq) > > > *irq = data->irq; > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > } > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > FWIW, the introduced function in that patch > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > acpi_register_gsi_ioapic() without checking if the GSI is already > registered, which might lead to multiple IRQs being allocated for the > same underlying GSI? I understand this point and I think it needs investigating. > As I commented there, I think that approach is wrong. If the GSI has > not been mapped in Xen (because dom0 hasn't unmasked the respective > IO-APIC pin) we should add some logic in the toolstack to map it > before attempting to bind. But this statement confuses me. The toolstack doesn't get involved in IRQ setup for PCI devices for HVM guests? Keep in mind that this is a regular HVM guest creation on PVH Dom0, so normally the IRQ setup is done by QEMU, and QEMU already calls xc_physdev_map_pirq and xc_domain_bind_pt_pci_irq. So I don't follow your statement about "the toolstack to map it before attempting to bind".
On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > > >> So yes, it then all boils down to that Linux- > > > >> internal question. > > > > > > > > Excellent question but we'll have to wait for Ray as he is the one with > > > > access to the hardware. But I have this data I can share in the > > > > meantime: > > > > > > > > [ 1.260378] IRQ to pin mappings: > > > > [ 1.260387] IRQ1 -> 0:1 > > > > [ 1.260395] IRQ2 -> 0:2 > > > > [ 1.260403] IRQ3 -> 0:3 > > > > [ 1.260410] IRQ4 -> 0:4 > > > > [ 1.260418] IRQ5 -> 0:5 > > > > [ 1.260425] IRQ6 -> 0:6 > > > > [ 1.260432] IRQ7 -> 0:7 > > > > [ 1.260440] IRQ8 -> 0:8 > > > > [ 1.260447] IRQ9 -> 0:9 > > > > [ 1.260455] IRQ10 -> 0:10 > > > > [ 1.260462] IRQ11 -> 0:11 > > > > [ 1.260470] IRQ12 -> 0:12 > > > > [ 1.260478] IRQ13 -> 0:13 > > > > [ 1.260485] IRQ14 -> 0:14 > > > > [ 1.260493] IRQ15 -> 0:15 > > > > [ 1.260505] IRQ106 -> 1:8 > > > > [ 1.260513] IRQ112 -> 1:4 > > > > [ 1.260521] IRQ116 -> 1:13 > > > > [ 1.260529] IRQ117 -> 1:14 > > > > [ 1.260537] IRQ118 -> 1:15 > > > > [ 1.260544] .................................... done. > > > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > > Linux running baremetal on the same hardware? > > > > So I have some emails from Ray from he time he was looking into this, > > and on Linux dom0 PVH dmesg there is: > > > > [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 > > [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 > > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > > consistent. > > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > > > 112 (which is the one causing issues): > > > > > > > > __acpi_register_gsi-> > > > > acpi_register_gsi_ioapic-> > > > > mp_map_gsi_to_irq-> > > > > mp_map_pin_to_irq-> > > > > __irq_resolve_mapping() > > > > > > > > if (likely(data)) { > > > > desc = irq_data_to_desc(data); > > > > if (irq) > > > > *irq = data->irq; > > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > > } > > > > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > > > FWIW, the introduced function in that patch > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > > acpi_register_gsi_ioapic() without checking if the GSI is already > > registered, which might lead to multiple IRQs being allocated for the > > same underlying GSI? > > I understand this point and I think it needs investigating. > > > > As I commented there, I think that approach is wrong. If the GSI has > > not been mapped in Xen (because dom0 hasn't unmasked the respective > > IO-APIC pin) we should add some logic in the toolstack to map it > > before attempting to bind. > > But this statement confuses me. The toolstack doesn't get involved in > IRQ setup for PCI devices for HVM guests? It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that cold be removed (maybe for qemu-trad only?) or it's also required by QEMU upstream, I would have to investigate more. It's my understanding it's in pci_add_dm_done() where Ray was getting the mismatched IRQ vs GSI number. Thanks, Roger.
On Fri, 17 Mar 2023, Roger Pau Monné wrote: > On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > > > >> So yes, it then all boils down to that Linux- > > > > >> internal question. > > > > > > > > > > Excellent question but we'll have to wait for Ray as he is the one with > > > > > access to the hardware. But I have this data I can share in the > > > > > meantime: > > > > > > > > > > [ 1.260378] IRQ to pin mappings: > > > > > [ 1.260387] IRQ1 -> 0:1 > > > > > [ 1.260395] IRQ2 -> 0:2 > > > > > [ 1.260403] IRQ3 -> 0:3 > > > > > [ 1.260410] IRQ4 -> 0:4 > > > > > [ 1.260418] IRQ5 -> 0:5 > > > > > [ 1.260425] IRQ6 -> 0:6 > > > > > [ 1.260432] IRQ7 -> 0:7 > > > > > [ 1.260440] IRQ8 -> 0:8 > > > > > [ 1.260447] IRQ9 -> 0:9 > > > > > [ 1.260455] IRQ10 -> 0:10 > > > > > [ 1.260462] IRQ11 -> 0:11 > > > > > [ 1.260470] IRQ12 -> 0:12 > > > > > [ 1.260478] IRQ13 -> 0:13 > > > > > [ 1.260485] IRQ14 -> 0:14 > > > > > [ 1.260493] IRQ15 -> 0:15 > > > > > [ 1.260505] IRQ106 -> 1:8 > > > > > [ 1.260513] IRQ112 -> 1:4 > > > > > [ 1.260521] IRQ116 -> 1:13 > > > > > [ 1.260529] IRQ117 -> 1:14 > > > > > [ 1.260537] IRQ118 -> 1:15 > > > > > [ 1.260544] .................................... done. > > > > > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > > > Linux running baremetal on the same hardware? > > > > > > So I have some emails from Ray from he time he was looking into this, > > > and on Linux dom0 PVH dmesg there is: > > > > > > [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 > > > [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 > > > > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > > > consistent. > > > > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > > > > 112 (which is the one causing issues): > > > > > > > > > > __acpi_register_gsi-> > > > > > acpi_register_gsi_ioapic-> > > > > > mp_map_gsi_to_irq-> > > > > > mp_map_pin_to_irq-> > > > > > __irq_resolve_mapping() > > > > > > > > > > if (likely(data)) { > > > > > desc = irq_data_to_desc(data); > > > > > if (irq) > > > > > *irq = data->irq; > > > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > > > } > > > > > > > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > > > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > > > > > FWIW, the introduced function in that patch > > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > > > acpi_register_gsi_ioapic() without checking if the GSI is already > > > registered, which might lead to multiple IRQs being allocated for the > > > same underlying GSI? > > > > I understand this point and I think it needs investigating. > > > > > > > As I commented there, I think that approach is wrong. If the GSI has > > > not been mapped in Xen (because dom0 hasn't unmasked the respective > > > IO-APIC pin) we should add some logic in the toolstack to map it > > > before attempting to bind. > > > > But this statement confuses me. The toolstack doesn't get involved in > > IRQ setup for PCI devices for HVM guests? > > It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call > to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that > cold be removed (maybe for qemu-trad only?) or it's also required by > QEMU upstream, I would have to investigate more. You are right. I am not certain, but it seems like a mistake in the toolstack to me. In theory, pci_add_dm_done should only be needed for PV guests, not for HVM guests. I am not sure. But I can see the call to xc_physdev_map_pirq you were referring to now. > It's my understanding it's in pci_add_dm_done() where Ray was getting > the mismatched IRQ vs GSI number. I think the mismatch was actually caused by the xc_physdev_map_pirq call from QEMU, which makes sense because in any case it should happen before the same call done by pci_add_dm_done (pci_add_dm_done is called after sending the pci passthrough QMP command to QEMU). So the first to hit the IRQ!=GSI problem would be QEMU.
On Fri, Mar 17, 2023 at 01:55:08PM -0700, Stefano Stabellini wrote: > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > > > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > > > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > > > > >> So yes, it then all boils down to that Linux- > > > > > >> internal question. > > > > > > > > > > > > Excellent question but we'll have to wait for Ray as he is the one with > > > > > > access to the hardware. But I have this data I can share in the > > > > > > meantime: > > > > > > > > > > > > [ 1.260378] IRQ to pin mappings: > > > > > > [ 1.260387] IRQ1 -> 0:1 > > > > > > [ 1.260395] IRQ2 -> 0:2 > > > > > > [ 1.260403] IRQ3 -> 0:3 > > > > > > [ 1.260410] IRQ4 -> 0:4 > > > > > > [ 1.260418] IRQ5 -> 0:5 > > > > > > [ 1.260425] IRQ6 -> 0:6 > > > > > > [ 1.260432] IRQ7 -> 0:7 > > > > > > [ 1.260440] IRQ8 -> 0:8 > > > > > > [ 1.260447] IRQ9 -> 0:9 > > > > > > [ 1.260455] IRQ10 -> 0:10 > > > > > > [ 1.260462] IRQ11 -> 0:11 > > > > > > [ 1.260470] IRQ12 -> 0:12 > > > > > > [ 1.260478] IRQ13 -> 0:13 > > > > > > [ 1.260485] IRQ14 -> 0:14 > > > > > > [ 1.260493] IRQ15 -> 0:15 > > > > > > [ 1.260505] IRQ106 -> 1:8 > > > > > > [ 1.260513] IRQ112 -> 1:4 > > > > > > [ 1.260521] IRQ116 -> 1:13 > > > > > > [ 1.260529] IRQ117 -> 1:14 > > > > > > [ 1.260537] IRQ118 -> 1:15 > > > > > > [ 1.260544] .................................... done. > > > > > > > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > > > > Linux running baremetal on the same hardware? > > > > > > > > So I have some emails from Ray from he time he was looking into this, > > > > and on Linux dom0 PVH dmesg there is: > > > > > > > > [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 > > > > [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 > > > > > > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > > > > consistent. > > > > > > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > > > > > 112 (which is the one causing issues): > > > > > > > > > > > > __acpi_register_gsi-> > > > > > > acpi_register_gsi_ioapic-> > > > > > > mp_map_gsi_to_irq-> > > > > > > mp_map_pin_to_irq-> > > > > > > __irq_resolve_mapping() > > > > > > > > > > > > if (likely(data)) { > > > > > > desc = irq_data_to_desc(data); > > > > > > if (irq) > > > > > > *irq = data->irq; > > > > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > > > > } > > > > > > > > > > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > > > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > > > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > > > > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > > > > > > > FWIW, the introduced function in that patch > > > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > > > > acpi_register_gsi_ioapic() without checking if the GSI is already > > > > registered, which might lead to multiple IRQs being allocated for the > > > > same underlying GSI? > > > > > > I understand this point and I think it needs investigating. > > > > > > > > > > As I commented there, I think that approach is wrong. If the GSI has > > > > not been mapped in Xen (because dom0 hasn't unmasked the respective > > > > IO-APIC pin) we should add some logic in the toolstack to map it > > > > before attempting to bind. > > > > > > But this statement confuses me. The toolstack doesn't get involved in > > > IRQ setup for PCI devices for HVM guests? > > > > It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call > > to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that > > cold be removed (maybe for qemu-trad only?) or it's also required by > > QEMU upstream, I would have to investigate more. > > You are right. I am not certain, but it seems like a mistake in the > toolstack to me. In theory, pci_add_dm_done should only be needed for PV > guests, not for HVM guests. I am not sure. But I can see the call to > xc_physdev_map_pirq you were referring to now. > > > > It's my understanding it's in pci_add_dm_done() where Ray was getting > > the mismatched IRQ vs GSI number. > > I think the mismatch was actually caused by the xc_physdev_map_pirq call > from QEMU, which makes sense because in any case it should happen before > the same call done by pci_add_dm_done (pci_add_dm_done is called after > sending the pci passthrough QMP command to QEMU). So the first to hit > the IRQ!=GSI problem would be QEMU. I've been thinking about this a bit, and I think one of the possible issues with the current handling of GSIs in a PVH dom0 is that GSIs don't get registered until/unless they are unmasked. I could see this as a problem when doing passthrough: it's possible for a GSI (iow: vIO-APIC pin) to never get unmasked on dom0, because the device driver(s) are using MSI(-X) interrupts instead. However, the IO-APIC pin must be configured for it to be able to be mapped into a domU. A possible solution is to propagate the vIO-APIC pin configuration trigger/polarity when dom0 writes the low part of the redirection table entry. The patch below enables the usage of PHYSDEVOP_{un,}map_pirq from PVH domains (I need to assert this is secure even for domUs) and also propagates the vIO-APIC pin trigger/polarity mode on writes to the low part of the RTE. Such propagation leads to the following interrupt setup in Xen: IRQ: 0 vec:f0 IO-APIC-edge status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt() IRQ: 1 vec:38 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 2 vec:a8 IO-APIC-edge status=000 aff:{0-7}/{0-7} no_action() IRQ: 3 vec:f1 IO-APIC-edge status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt() IRQ: 4 vec:40 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 5 vec:48 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 6 vec:50 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 7 vec:58 IO-APIC-edge status=006 aff:{0-7}/{0} mapped, unbound IRQ: 8 vec:60 IO-APIC-edge status=010 aff:{0}/{0} in-flight=0 d0: 8(-M-) IRQ: 9 vec:68 IO-APIC-edge status=010 aff:{0}/{0} in-flight=0 d0: 9(-M-) IRQ: 10 vec:70 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 11 vec:78 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 12 vec:88 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 13 vec:90 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 14 vec:98 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 15 vec:a0 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 16 vec:b0 IO-APIC-edge status=010 aff:{1}/{0-7} in-flight=0 d0: 16(-M-) IRQ: 17 vec:b8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 18 vec:c0 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 19 vec:c8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 20 vec:d0 IO-APIC-edge status=010 aff:{1}/{0-7} in-flight=0 d0: 20(-M-) IRQ: 21 vec:d8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 22 vec:e0 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 23 vec:e8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound Note how now all GSIs on my box are setup, even when not bound to dom0 anymore. The output without this patch looks like: IRQ: 0 vec:f0 IO-APIC-edge status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt() IRQ: 1 vec:38 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 3 vec:f1 IO-APIC-edge status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt() IRQ: 4 vec:40 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 5 vec:48 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 6 vec:50 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 7 vec:58 IO-APIC-edge status=006 aff:{0}/{0} mapped, unbound IRQ: 8 vec:d0 IO-APIC-edge status=010 aff:{6}/{0-7} in-flight=0 d0: 8(-M-) IRQ: 9 vec:a8 IO-APIC-level status=010 aff:{2}/{0-7} in-flight=0 d0: 9(-M-) IRQ: 10 vec:70 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 11 vec:78 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 12 vec:88 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 13 vec:90 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 14 vec:98 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 15 vec:a0 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 16 vec:e0 IO-APIC-level status=010 aff:{6}/{0-7} in-flight=0 d0: 16(-M-),d1: 16(-M-) IRQ: 20 vec:d8 IO-APIC-level status=010 aff:{6}/{0-7} in-flight=0 d0: 20(-M-) Legacy IRQs (below 16) are always registered. With the patch above I seem to be able to do PCI passthrough to an HVM domU from a PVH dom0. Regards, Roger. --- diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 405d0a95af..cc53a3bd12 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -86,6 +86,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: + break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c index 41e3c4d5e4..50e23a093c 100644 --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -180,9 +180,7 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig, /* Interrupt has been unmasked, bind it now. */ ret = mp_register_gsi(gsi, trig, pol); - if ( ret == -EEXIST ) - return 0; - if ( ret ) + if ( ret && ret != -EEXIST ) { gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", gsi, ret); @@ -244,12 +242,18 @@ static void vioapic_write_redirent( } else { + int ret; + unmasked = ent.fields.mask; /* Remote IRR and Delivery Status are read-only. */ ent.bits = ((ent.bits >> 32) << 32) | val; ent.fields.delivery_status = 0; ent.fields.remote_irr = pent->fields.remote_irr; unmasked = unmasked && !ent.fields.mask; + ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity); + if ( ret && ret != -EEXIST ) + gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", + gsi, ret); } *pent = ent;
On 20.03.2023 16:16, Roger Pau Monné wrote: > @@ -244,12 +242,18 @@ static void vioapic_write_redirent( > } > else > { > + int ret; > + > unmasked = ent.fields.mask; > /* Remote IRR and Delivery Status are read-only. */ > ent.bits = ((ent.bits >> 32) << 32) | val; > ent.fields.delivery_status = 0; > ent.fields.remote_irr = pent->fields.remote_irr; > unmasked = unmasked && !ent.fields.mask; > + ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity); > + if ( ret && ret != -EEXIST ) > + gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", > + gsi, ret); > } I assume this is only meant to be experimental, as I'm missing confinement to Dom0 here. I also question this when the mask bit as set, as in that case neither the trigger mode bit nor the polarity one can be relied upon. At which point it would look to me as if it was necessary for Dom0 to use a hypercall instead (which naturally would then be PHYSDEVOP_setup_gsi). Jan
On Mon, Mar 20, 2023 at 04:29:25PM +0100, Jan Beulich wrote: > On 20.03.2023 16:16, Roger Pau Monné wrote: > > @@ -244,12 +242,18 @@ static void vioapic_write_redirent( > > } > > else > > { > > + int ret; > > + > > unmasked = ent.fields.mask; > > /* Remote IRR and Delivery Status are read-only. */ > > ent.bits = ((ent.bits >> 32) << 32) | val; > > ent.fields.delivery_status = 0; > > ent.fields.remote_irr = pent->fields.remote_irr; > > unmasked = unmasked && !ent.fields.mask; > > + ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity); > > + if ( ret && ret != -EEXIST ) > > + gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", > > + gsi, ret); > > } > > I assume this is only meant to be experimental, as I'm missing confinement > to Dom0 here. Indeed. I've attached a fixed version below, let's make sure this doesn't influence testing. > I also question this when the mask bit as set, as in that > case neither the trigger mode bit nor the polarity one can be relied upon. > At which point it would look to me as if it was necessary for Dom0 to use > a hypercall instead (which naturally would then be PHYSDEVOP_setup_gsi). AFAICT Linux does correctly set the trigger/polarity even when the pins are masked, so this should be safe as a proof of concept. Let's first figure out whether the issue is really with the lack of setup of the IO-APIC pins. At the end without input from Ray this is just a wild guess. Regards, Roger. ---- diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 405d0a95af..cc53a3bd12 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -86,6 +86,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: + break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c index 41e3c4d5e4..64f7b5bcc5 100644 --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -180,9 +180,7 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig, /* Interrupt has been unmasked, bind it now. */ ret = mp_register_gsi(gsi, trig, pol); - if ( ret == -EEXIST ) - return 0; - if ( ret ) + if ( ret && ret != -EEXIST ) { gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", gsi, ret); @@ -250,6 +248,16 @@ static void vioapic_write_redirent( ent.fields.delivery_status = 0; ent.fields.remote_irr = pent->fields.remote_irr; unmasked = unmasked && !ent.fields.mask; + if ( is_hardware_domain(d) ) + { + int ret = mp_register_gsi(gsi, ent.fields.trig_mode, + ent.fields.polarity); + + if ( ret && ret != -EEXIST ) + gprintk(XENLOG_WARNING, + "vioapic: error registering GSI %u: %d\n", + gsi, ret); + } } *pent = ent;
Hi, On 2023/3/18 04:55, Stefano Stabellini wrote: > On Fri, 17 Mar 2023, Roger Pau Monné wrote: >> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: >>> On Fri, 17 Mar 2023, Roger Pau Monné wrote: >>>> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: >>>>> On 17.03.2023 00:19, Stefano Stabellini wrote: >>>>>> On Thu, 16 Mar 2023, Jan Beulich wrote: >>>>>>> So yes, it then all boils down to that Linux- >>>>>>> internal question. >>>>>> >>>>>> Excellent question but we'll have to wait for Ray as he is the one with >>>>>> access to the hardware. But I have this data I can share in the >>>>>> meantime: >>>>>> >>>>>> [ 1.260378] IRQ to pin mappings: >>>>>> [ 1.260387] IRQ1 -> 0:1 >>>>>> [ 1.260395] IRQ2 -> 0:2 >>>>>> [ 1.260403] IRQ3 -> 0:3 >>>>>> [ 1.260410] IRQ4 -> 0:4 >>>>>> [ 1.260418] IRQ5 -> 0:5 >>>>>> [ 1.260425] IRQ6 -> 0:6 >>>>>> [ 1.260432] IRQ7 -> 0:7 >>>>>> [ 1.260440] IRQ8 -> 0:8 >>>>>> [ 1.260447] IRQ9 -> 0:9 >>>>>> [ 1.260455] IRQ10 -> 0:10 >>>>>> [ 1.260462] IRQ11 -> 0:11 >>>>>> [ 1.260470] IRQ12 -> 0:12 >>>>>> [ 1.260478] IRQ13 -> 0:13 >>>>>> [ 1.260485] IRQ14 -> 0:14 >>>>>> [ 1.260493] IRQ15 -> 0:15 >>>>>> [ 1.260505] IRQ106 -> 1:8 >>>>>> [ 1.260513] IRQ112 -> 1:4 >>>>>> [ 1.260521] IRQ116 -> 1:13 >>>>>> [ 1.260529] IRQ117 -> 1:14 >>>>>> [ 1.260537] IRQ118 -> 1:15 >>>>>> [ 1.260544] .................................... done. >>>>> >>>>> And what does Linux think are IRQs 16 ... 105? Have you compared with >>>>> Linux running baremetal on the same hardware? >>>> >>>> So I have some emails from Ray from he time he was looking into this, >>>> and on Linux dom0 PVH dmesg there is: >>>> >>>> [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 >>>> [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 >>>> >>>> So it seems the vIO-APIC data provided by Xen to dom0 is at least >>>> consistent. >>>> >>>>>> And I think Ray traced the point in Linux where Linux gives us an IRQ == >>>>>> 112 (which is the one causing issues): >>>>>> >>>>>> __acpi_register_gsi-> >>>>>> acpi_register_gsi_ioapic-> >>>>>> mp_map_gsi_to_irq-> >>>>>> mp_map_pin_to_irq-> >>>>>> __irq_resolve_mapping() >>>>>> >>>>>> if (likely(data)) { >>>>>> desc = irq_data_to_desc(data); >>>>>> if (irq) >>>>>> *irq = data->irq; >>>>>> /* this IRQ is 112, IO-APIC-34 domain */ >>>>>> } >>>> >>>> >>>> Could this all be a result of patch 4/5 in the Linux series ("[RFC >>>> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different >>>> __acpi_register_gsi hook is installed for PVH in order to setup GSIs >>>> using PHYSDEV ops instead of doing it natively from the IO-APIC? >>>> >>>> FWIW, the introduced function in that patch >>>> (acpi_register_gsi_xen_pvh()) seems to unconditionally call >>>> acpi_register_gsi_ioapic() without checking if the GSI is already >>>> registered, which might lead to multiple IRQs being allocated for the >>>> same underlying GSI? >>> >>> I understand this point and I think it needs investigating. >>> >>> >>>> As I commented there, I think that approach is wrong. If the GSI has >>>> not been mapped in Xen (because dom0 hasn't unmasked the respective >>>> IO-APIC pin) we should add some logic in the toolstack to map it >>>> before attempting to bind. >>> >>> But this statement confuses me. The toolstack doesn't get involved in >>> IRQ setup for PCI devices for HVM guests? >> >> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call >> to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that >> cold be removed (maybe for qemu-trad only?) or it's also required by >> QEMU upstream, I would have to investigate more. > > You are right. I am not certain, but it seems like a mistake in the > toolstack to me. In theory, pci_add_dm_done should only be needed for PV > guests, not for HVM guests. I am not sure. But I can see the call to > xc_physdev_map_pirq you were referring to now. > > >> It's my understanding it's in pci_add_dm_done() where Ray was getting >> the mismatched IRQ vs GSI number. > > I think the mismatch was actually caused by the xc_physdev_map_pirq call > from QEMU, which makes sense because in any case it should happen before > the same call done by pci_add_dm_done (pci_add_dm_done is called after > sending the pci passthrough QMP command to QEMU). So the first to hit > the IRQ!=GSI problem would be QEMU. Sorry for replying to you so late. And thank you all for review. I realized that your questions mainly focus on the following points: 1. Why irq is not equal with gsi? 2. Why I do the translations between irq and gsi? 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? Please forgive me for making a summary response first. And I am looking forward to your comments. 1. Why irq is not equal with gsi? As far as I know, irq is dynamically allocated according to gsi, they are not necessarily equal. When I run "sudo xl pci-assignable-add 03:00.0" to assign passthrough device(Taking dGPU on my environment as an example, which gsi is 28). It will call into acpi_register_gsi_ioapic to get irq, the callstack is: acpi_register_gsi_ioapic mp_map_gsi_to_irq mp_map_pin_to_irq irq_find_mapping(if gsi has been mapped to an irq before, it will return corresponding irq here) alloc_irq_from_domain __irq_domain_alloc_irqs irq_domain_alloc_descs __irq_alloc_descs If you add some printings like below: --------------------------------------------------------------------------------------------------------------------------------------------- diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index a868b76cd3d4..970fd461be7a 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1067,6 +1067,8 @@ static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin, } } mutex_unlock(&ioapic_mutex); + printk("cjq_debug mp_map_pin_to_irq gsi: %u, irq: %d, idx: %d, ioapic: %d, pin: %d\n", + gsi, irq, idx, ioapic, pin); return irq; } diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 5db0230aa6b5..4e9613abbe96 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -786,6 +786,8 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS, from, cnt, 0); ret = -EEXIST; + printk("cjq_debug __irq_alloc_descs irq: %d, from: %u, cnt: %u, node: %d, start: %d, nr_irqs: %d\n", + irq, from, cnt, node, start, nr_irqs); if (irq >=0 && start != irq) goto unlock; --------------------------------------------------------------------------------------------------------------------------------------------- You will get output on PVH dom0: [ 0.181560] cjq_debug __irq_alloc_descs irq: 1, from: 1, cnt: 1, node: -1, start: 1, nr_irqs: 1096 [ 0.181639] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 [ 0.181641] cjq_debug __irq_alloc_descs irq: 2, from: 2, cnt: 1, node: -1, start: 2, nr_irqs: 1096 [ 0.181682] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 2, idx: 0, ioapic: 0, pin: 2 [ 0.181683] cjq_debug __irq_alloc_descs irq: 3, from: 3, cnt: 1, node: -1, start: 3, nr_irqs: 1096 [ 0.181715] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 [ 0.181716] cjq_debug __irq_alloc_descs irq: 4, from: 4, cnt: 1, node: -1, start: 4, nr_irqs: 1096 [ 0.181751] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 [ 0.181752] cjq_debug __irq_alloc_descs irq: 5, from: 5, cnt: 1, node: -1, start: 5, nr_irqs: 1096 [ 0.181783] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 [ 0.181784] cjq_debug __irq_alloc_descs irq: 6, from: 6, cnt: 1, node: -1, start: 6, nr_irqs: 1096 [ 0.181813] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 [ 0.181814] cjq_debug __irq_alloc_descs irq: 7, from: 7, cnt: 1, node: -1, start: 7, nr_irqs: 1096 [ 0.181856] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 [ 0.181857] cjq_debug __irq_alloc_descs irq: 8, from: 8, cnt: 1, node: -1, start: 8, nr_irqs: 1096 [ 0.181888] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 [ 0.181889] cjq_debug __irq_alloc_descs irq: 9, from: 9, cnt: 1, node: -1, start: 9, nr_irqs: 1096 [ 0.181918] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 [ 0.181919] cjq_debug __irq_alloc_descs irq: 10, from: 10, cnt: 1, node: -1, start: 10, nr_irqs: 1096 [ 0.181950] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 [ 0.181951] cjq_debug __irq_alloc_descs irq: 11, from: 11, cnt: 1, node: -1, start: 11, nr_irqs: 1096 [ 0.181977] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 [ 0.181979] cjq_debug __irq_alloc_descs irq: 12, from: 12, cnt: 1, node: -1, start: 12, nr_irqs: 1096 [ 0.182006] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 [ 0.182007] cjq_debug __irq_alloc_descs irq: 13, from: 13, cnt: 1, node: -1, start: 13, nr_irqs: 1096 [ 0.182034] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 [ 0.182035] cjq_debug __irq_alloc_descs irq: 14, from: 14, cnt: 1, node: -1, start: 14, nr_irqs: 1096 [ 0.182066] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 [ 0.182067] cjq_debug __irq_alloc_descs irq: 15, from: 15, cnt: 1, node: -1, start: 15, nr_irqs: 1096 [ 0.182095] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 [ 0.198199] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 [ 0.198416] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 [ 0.198460] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 [ 0.198489] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 [ 0.198523] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 [ 0.201315] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 [ 0.202174] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 [ 0.202225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 [ 0.202259] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 [ 0.202291] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 63, nr_irqs: 1096 [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 64, nr_irqs: 1096 [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 66, nr_irqs: 1096 [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 67, nr_irqs: 1096 [ 0.210169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 68, nr_irqs: 1096 [ 0.210322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 69, nr_irqs: 1096 [ 0.210370] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 70, nr_irqs: 1096 [ 0.210403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 71, nr_irqs: 1096 [ 0.210436] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 72, nr_irqs: 1096 [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 73, nr_irqs: 1096 [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 74, nr_irqs: 1096 [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 75, nr_irqs: 1096 [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 76, nr_irqs: 1096 [ 0.214151] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 77, nr_irqs: 1096 [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 78, nr_irqs: 1096 [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 79, nr_irqs: 1096 [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 80, nr_irqs: 1096 [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 81, nr_irqs: 1096 [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 [ 0.222215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 [ 0.222366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 [ 0.222410] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 90, nr_irqs: 1096 [ 0.222447] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 91, nr_irqs: 1096 [ 0.222478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 92, nr_irqs: 1096 [ 0.225490] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 93, nr_irqs: 1096 [ 0.226225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 94, nr_irqs: 1096 [ 0.226268] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 95, nr_irqs: 1096 [ 0.226300] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 96, nr_irqs: 1096 [ 0.226329] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 97, nr_irqs: 1096 [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 98, nr_irqs: 1096 [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 99, nr_irqs: 1096 [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 100, nr_irqs: 1096 [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 101, nr_irqs: 1096 [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 102, nr_irqs: 1096 [ 0.232399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 103, nr_irqs: 1096 [ 0.248854] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 104, nr_irqs: 1096 [ 0.250609] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 105, nr_irqs: 1096 [ 0.372343] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 [ 0.720950] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 [ 0.721052] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 [ 1.254825] cjq_debug mp_map_pin_to_irq gsi: 7, irq: -16, idx: 7, ioapic: 0, pin: 7 [ 1.333081] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 [ 1.375882] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 106, nr_irqs: 1096 [ 1.375951] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 106, idx: -1, ioapic: 1, pin: 8 [ 1.376072] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 [ 1.376121] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 [ 1.472551] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 [ 1.472697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 [ 1.472751] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 [ 1.484290] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 [ 1.768163] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 [ 1.768627] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 108, nr_irqs: 1096 [ 1.769059] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 109, nr_irqs: 1096 [ 1.769694] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 110, nr_irqs: 1096 [ 1.770169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 111, nr_irqs: 1096 [ 1.770697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 112, nr_irqs: 1096 [ 1.770738] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 [ 1.770789] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 113, nr_irqs: 1096 [ 1.771230] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 [ 1.771278] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 114, nr_irqs: 1096 [ 2.127884] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 115, nr_irqs: 1096 [ 3.207419] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 116, nr_irqs: 1096 [ 3.207730] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 [ 3.208120] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 117, nr_irqs: 1096 [ 3.208475] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 [ 3.208478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 118, nr_irqs: 1096 [ 3.208861] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 [ 3.208933] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 119, nr_irqs: 1096 [ 3.209127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 120, nr_irqs: 1096 [ 3.209383] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 121, nr_irqs: 1096 [ 3.209863] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 122, nr_irqs: 1096 [ 3.211439] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 123, nr_irqs: 1096 [ 3.211833] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 124, nr_irqs: 1096 [ 3.212873] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 125, nr_irqs: 1096 [ 3.243514] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 126, nr_irqs: 1096 [ 3.243689] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 126, idx: -1, ioapic: 1, pin: 14 [ 3.244293] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 127, nr_irqs: 1096 [ 3.244534] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 128, nr_irqs: 1096 [ 3.244714] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 129, nr_irqs: 1096 [ 3.244911] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 130, nr_irqs: 1096 [ 3.245096] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 131, nr_irqs: 1096 [ 3.245633] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 132, nr_irqs: 1096 [ 3.247890] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 133, nr_irqs: 1096 [ 3.248192] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 134, nr_irqs: 1096 [ 3.271093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 135, nr_irqs: 1096 [ 3.307045] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 136, nr_irqs: 1096 [ 3.307162] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 136, idx: -1, ioapic: 1, pin: 24 [ 3.307223] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 [ 3.331183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 [ 3.331295] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 138, nr_irqs: 1096 [ 3.331366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 139, nr_irqs: 1096 [ 3.331438] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 140, nr_irqs: 1096 [ 3.331511] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 141, nr_irqs: 1096 [ 3.331579] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 142, nr_irqs: 1096 [ 3.331646] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 143, nr_irqs: 1096 [ 3.331713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 144, nr_irqs: 1096 [ 3.331780] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 145, nr_irqs: 1096 [ 3.331846] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 146, nr_irqs: 1096 [ 3.331913] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 147, nr_irqs: 1096 [ 3.331984] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 148, nr_irqs: 1096 [ 3.332051] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 149, nr_irqs: 1096 [ 3.332118] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 150, nr_irqs: 1096 [ 3.332183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 151, nr_irqs: 1096 [ 3.332252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 152, nr_irqs: 1096 [ 3.332319] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 153, nr_irqs: 1096 [ 8.010370] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 [ 9.545439] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 [ 9.545713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 154, nr_irqs: 1096 [ 9.546034] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 155, nr_irqs: 1096 [ 9.687796] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 156, nr_irqs: 1096 [ 9.687979] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 [ 9.688057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 157, nr_irqs: 1096 [ 9.921038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 158, nr_irqs: 1096 [ 9.921210] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 158, idx: -1, ioapic: 1, pin: 5 [ 9.921403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 159, nr_irqs: 1096 [ 9.926373] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 [ 9.926747] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 160, nr_irqs: 1096 [ 9.928201] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 [ 9.928488] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 161, nr_irqs: 1096 [ 10.653915] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 162, nr_irqs: 1096 [ 10.656257] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 163, nr_irqs: 1096 You can find that the allocation of irq is not always based on the value of gsi. It follows the principle of requesting first, distributing first, like gsi 32 get 106 but gsi 28 get 112. And not only acpi_register_gsi_ioapic() will call into __irq_alloc_descs, but other functions will call, even earlier. Above output is like baremetal. So, we can get conclusion irq != gsi. See below output on linux: [ 0.105053] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 [ 0.105061] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 0, idx: 0, ioapic: 0, pin: 2 [ 0.105069] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 [ 0.105078] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 [ 0.105086] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 [ 0.105094] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 [ 0.105103] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 [ 0.105111] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 [ 0.105119] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 [ 0.105127] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 [ 0.105136] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 [ 0.105144] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 [ 0.105152] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 [ 0.105160] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 [ 0.105169] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 [ 0.398134] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 [ 1.169293] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 [ 1.169394] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 [ 1.323132] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 [ 1.345425] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 [ 1.375502] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 [ 1.375575] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 24, idx: -1, ioapic: 1, pin: 8 [ 1.375661] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 [ 1.375705] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 [ 1.442277] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 [ 1.442393] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 [ 1.442450] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 [ 1.453893] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 [ 1.456127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 [ 1.734065] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 [ 1.734165] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 [ 1.734253] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 [ 1.734344] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 [ 1.734426] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 [ 1.734512] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 [ 1.734597] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 [ 1.734643] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 [ 1.734687] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 [ 1.734728] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 [ 1.735017] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 [ 1.735252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 [ 1.735467] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 [ 1.735799] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 [ 1.736024] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 [ 1.736364] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 [ 1.736406] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 [ 1.736434] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 [ 1.736701] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 [ 1.736724] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 [ 3.037123] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 [ 3.037313] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 [ 3.037515] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 [ 3.037738] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 [ 3.037959] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 [ 3.038073] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 [ 3.038154] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 [ 3.038179] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 [ 3.038277] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 [ 3.038399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 [ 3.038525] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 [ 3.038657] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 [ 3.038852] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 [ 3.052377] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 [ 3.052479] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 54, idx: -1, ioapic: 1, pin: 14 [ 3.052730] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 [ 3.052840] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 [ 3.052918] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 [ 3.052987] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 [ 3.053069] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 [ 3.053139] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 [ 3.053201] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 [ 3.053260] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 [ 3.089128] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 63, nr_irqs: 1096 [ 3.089310] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 63, idx: -1, ioapic: 1, pin: 24 [ 3.089376] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 [ 3.103435] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 [ 3.114190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 [ 3.114346] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 66, nr_irqs: 1096 [ 3.121215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 67, nr_irqs: 1096 [ 3.121350] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 68, nr_irqs: 1096 [ 3.121479] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 69, nr_irqs: 1096 [ 3.121612] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 70, nr_irqs: 1096 [ 3.121726] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 71, nr_irqs: 1096 [ 3.121841] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 72, nr_irqs: 1096 [ 3.121955] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 73, nr_irqs: 1096 [ 3.122025] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 74, nr_irqs: 1096 [ 3.122093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 75, nr_irqs: 1096 [ 3.122148] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 76, nr_irqs: 1096 [ 3.122203] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 77, nr_irqs: 1096 [ 3.122265] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 78, nr_irqs: 1096 [ 3.122322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 79, nr_irqs: 1096 [ 3.122378] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 80, nr_irqs: 1096 [ 3.122433] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 81, nr_irqs: 1096 [ 7.838753] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 [ 9.619174] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 [ 9.619556] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 [ 9.622038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 [ 9.634900] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 [ 9.635316] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 [ 9.635405] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 [ 10.006686] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 [ 10.006823] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 86, idx: -1, ioapic: 1, pin: 5 [ 10.007009] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 [ 10.008723] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 [ 10.009853] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 [ 10.010786] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 [ 10.010858] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 2. Why I do the translations between irq and gsi? After answering question 1, we get irq != gsi. And I found, in QEMU, (pci_qdev_realize->xen_pt_realize->xen_host_pci_device_get->xen_host_pci_get_hex_value) will get the irq number, but later, pci_qdev_realize->xen_pt_realize->xc_physdev_map_pirq requires us to pass into gsi, it will call into Xen physdev_map_pirq-> allocate_and_map_gsi_pirq to allocate pirq for gsi. And then the error occurred. Not only that, the callback function pci_add_dm_done-> xc_physdev_map_pirq also need gsi. So, I added the function xc_physdev_map_pirq() to translate irq to gsi, for QEMU. And I didn't find similar functions in existing linux codes, and I think only "QEMU passthrough for Xen" need this translation, so I added it into privcmd. If you guys know any other similar functions or other more suitable places, please feel free to tell me. 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? Because if you want to map a gsi for domU, it must have a mapping in dom0. See QEMU code: pci_add_dm_done xc_physdev_map_pirq xc_domain_irq_permission XEN_DOMCTL_irq_permission pirq_access_permitted xc_physdev_map_pirq will get the pirq which mapped from gsi, and xc_domain_irq_permission will use pirq and call into Xen. If we don't do PHYSDEVOP_map_pirq for passthrough devices on PVH dom0, then pirq_access_permitted will get a NULL irq from dom0 and get failed. So, I added PHYSDEVOP_map_pirq for PVH dom0. But I think it is only necessary for passthrough devices to do that, instead of all devices which call __acpi_register_gsi. In next version patch, I will restrain that only passthrough devices can do PHYSDEVOP_map_pirq. 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? Like Roger's comments, the gsi of passthrough device doesn't be unmasked and registered(I added printings in vioapic_hwdom_map_gsi(), and I found that it never be called for dGPU with gsi 28 in my environment). So, I called PHYSDEVOP_setup_gsi to register gsi. But I agree with Roger and Jan's opinion, it is wrong to do PHYSDEVOP_setup_gsi for all devices. So, in next version patch, I will also restrain that only passthrough devices can do PHYSDEVOP_setup_gsi.
On Mon, Jul 31, 2023 at 04:40:35PM +0000, Chen, Jiqian wrote: > Hi, > > On 2023/3/18 04:55, Stefano Stabellini wrote: > > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > >> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > >>> On Fri, 17 Mar 2023, Roger Pau Monné wrote: > >>>> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > >>>>> On 17.03.2023 00:19, Stefano Stabellini wrote: > >>>>>> On Thu, 16 Mar 2023, Jan Beulich wrote: > >>>>>>> So yes, it then all boils down to that Linux- > >>>>>>> internal question. > >>>>>> > >>>>>> Excellent question but we'll have to wait for Ray as he is the one with > >>>>>> access to the hardware. But I have this data I can share in the > >>>>>> meantime: > >>>>>> > >>>>>> [ 1.260378] IRQ to pin mappings: > >>>>>> [ 1.260387] IRQ1 -> 0:1 > >>>>>> [ 1.260395] IRQ2 -> 0:2 > >>>>>> [ 1.260403] IRQ3 -> 0:3 > >>>>>> [ 1.260410] IRQ4 -> 0:4 > >>>>>> [ 1.260418] IRQ5 -> 0:5 > >>>>>> [ 1.260425] IRQ6 -> 0:6 > >>>>>> [ 1.260432] IRQ7 -> 0:7 > >>>>>> [ 1.260440] IRQ8 -> 0:8 > >>>>>> [ 1.260447] IRQ9 -> 0:9 > >>>>>> [ 1.260455] IRQ10 -> 0:10 > >>>>>> [ 1.260462] IRQ11 -> 0:11 > >>>>>> [ 1.260470] IRQ12 -> 0:12 > >>>>>> [ 1.260478] IRQ13 -> 0:13 > >>>>>> [ 1.260485] IRQ14 -> 0:14 > >>>>>> [ 1.260493] IRQ15 -> 0:15 > >>>>>> [ 1.260505] IRQ106 -> 1:8 > >>>>>> [ 1.260513] IRQ112 -> 1:4 > >>>>>> [ 1.260521] IRQ116 -> 1:13 > >>>>>> [ 1.260529] IRQ117 -> 1:14 > >>>>>> [ 1.260537] IRQ118 -> 1:15 > >>>>>> [ 1.260544] .................................... done. > >>>>> > >>>>> And what does Linux think are IRQs 16 ... 105? Have you compared with > >>>>> Linux running baremetal on the same hardware? > >>>> > >>>> So I have some emails from Ray from he time he was looking into this, > >>>> and on Linux dom0 PVH dmesg there is: > >>>> > >>>> [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 > >>>> [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 > >>>> > >>>> So it seems the vIO-APIC data provided by Xen to dom0 is at least > >>>> consistent. > >>>> > >>>>>> And I think Ray traced the point in Linux where Linux gives us an IRQ == > >>>>>> 112 (which is the one causing issues): > >>>>>> > >>>>>> __acpi_register_gsi-> > >>>>>> acpi_register_gsi_ioapic-> > >>>>>> mp_map_gsi_to_irq-> > >>>>>> mp_map_pin_to_irq-> > >>>>>> __irq_resolve_mapping() > >>>>>> > >>>>>> if (likely(data)) { > >>>>>> desc = irq_data_to_desc(data); > >>>>>> if (irq) > >>>>>> *irq = data->irq; > >>>>>> /* this IRQ is 112, IO-APIC-34 domain */ > >>>>>> } > >>>> > >>>> > >>>> Could this all be a result of patch 4/5 in the Linux series ("[RFC > >>>> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > >>>> __acpi_register_gsi hook is installed for PVH in order to setup GSIs > >>>> using PHYSDEV ops instead of doing it natively from the IO-APIC? > >>>> > >>>> FWIW, the introduced function in that patch > >>>> (acpi_register_gsi_xen_pvh()) seems to unconditionally call > >>>> acpi_register_gsi_ioapic() without checking if the GSI is already > >>>> registered, which might lead to multiple IRQs being allocated for the > >>>> same underlying GSI? > >>> > >>> I understand this point and I think it needs investigating. > >>> > >>> > >>>> As I commented there, I think that approach is wrong. If the GSI has > >>>> not been mapped in Xen (because dom0 hasn't unmasked the respective > >>>> IO-APIC pin) we should add some logic in the toolstack to map it > >>>> before attempting to bind. > >>> > >>> But this statement confuses me. The toolstack doesn't get involved in > >>> IRQ setup for PCI devices for HVM guests? > >> > >> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call > >> to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that > >> cold be removed (maybe for qemu-trad only?) or it's also required by > >> QEMU upstream, I would have to investigate more. > > > > You are right. I am not certain, but it seems like a mistake in the > > toolstack to me. In theory, pci_add_dm_done should only be needed for PV > > guests, not for HVM guests. I am not sure. But I can see the call to > > xc_physdev_map_pirq you were referring to now. > > > > > >> It's my understanding it's in pci_add_dm_done() where Ray was getting > >> the mismatched IRQ vs GSI number. > > > > I think the mismatch was actually caused by the xc_physdev_map_pirq call > > from QEMU, which makes sense because in any case it should happen before > > the same call done by pci_add_dm_done (pci_add_dm_done is called after > > sending the pci passthrough QMP command to QEMU). So the first to hit > > the IRQ!=GSI problem would be QEMU. > > > Sorry for replying to you so late. And thank you all for review. I realized that your questions mainly focus on the following points: 1. Why irq is not equal with gsi? 2. Why I do the translations between irq and gsi? 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? > Please forgive me for making a summary response first. And I am looking forward to your comments. Sorry, it's been a bit since that conversation, so my recollection is vague. One of the questions was why acpi_register_gsi_xen_pvh() is needed. I think the patch that introduced it on Linux didn't have much of a commit description. > 1. Why irq is not equal with gsi? > As far as I know, irq is dynamically allocated according to gsi, they are not necessarily equal. > When I run "sudo xl pci-assignable-add 03:00.0" to assign passthrough device(Taking dGPU on my environment as an example, which gsi is 28). It will call into acpi_register_gsi_ioapic to get irq, the callstack is: > acpi_register_gsi_ioapic > mp_map_gsi_to_irq > mp_map_pin_to_irq > irq_find_mapping(if gsi has been mapped to an irq before, it will return corresponding irq here) > alloc_irq_from_domain > __irq_domain_alloc_irqs > irq_domain_alloc_descs > __irq_alloc_descs Won't you perform double GSI registrations with Xen if both acpi_register_gsi_ioapic() and acpi_register_gsi_xen_pvh() are used? > > If you add some printings like below: > --------------------------------------------------------------------------------------------------------------------------------------------- > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c > index a868b76cd3d4..970fd461be7a 100644 > --- a/arch/x86/kernel/apic/io_apic.c > +++ b/arch/x86/kernel/apic/io_apic.c > @@ -1067,6 +1067,8 @@ static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin, > } > } > mutex_unlock(&ioapic_mutex); > + printk("cjq_debug mp_map_pin_to_irq gsi: %u, irq: %d, idx: %d, ioapic: %d, pin: %d\n", > + gsi, irq, idx, ioapic, pin); > > return irq; > } > diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c > index 5db0230aa6b5..4e9613abbe96 100644 > --- a/kernel/irq/irqdesc.c > +++ b/kernel/irq/irqdesc.c > @@ -786,6 +786,8 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, > start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS, > from, cnt, 0); > ret = -EEXIST; > + printk("cjq_debug __irq_alloc_descs irq: %d, from: %u, cnt: %u, node: %d, start: %d, nr_irqs: %d\n", > + irq, from, cnt, node, start, nr_irqs); > if (irq >=0 && start != irq) > goto unlock; > --------------------------------------------------------------------------------------------------------------------------------------------- > You will get output on PVH dom0: > > [ 0.181560] cjq_debug __irq_alloc_descs irq: 1, from: 1, cnt: 1, node: -1, start: 1, nr_irqs: 1096 > [ 0.181639] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 > [ 0.181641] cjq_debug __irq_alloc_descs irq: 2, from: 2, cnt: 1, node: -1, start: 2, nr_irqs: 1096 > [ 0.181682] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 2, idx: 0, ioapic: 0, pin: 2 > [ 0.181683] cjq_debug __irq_alloc_descs irq: 3, from: 3, cnt: 1, node: -1, start: 3, nr_irqs: 1096 > [ 0.181715] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 > [ 0.181716] cjq_debug __irq_alloc_descs irq: 4, from: 4, cnt: 1, node: -1, start: 4, nr_irqs: 1096 > [ 0.181751] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 > [ 0.181752] cjq_debug __irq_alloc_descs irq: 5, from: 5, cnt: 1, node: -1, start: 5, nr_irqs: 1096 > [ 0.181783] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 > [ 0.181784] cjq_debug __irq_alloc_descs irq: 6, from: 6, cnt: 1, node: -1, start: 6, nr_irqs: 1096 > [ 0.181813] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 > [ 0.181814] cjq_debug __irq_alloc_descs irq: 7, from: 7, cnt: 1, node: -1, start: 7, nr_irqs: 1096 > [ 0.181856] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 > [ 0.181857] cjq_debug __irq_alloc_descs irq: 8, from: 8, cnt: 1, node: -1, start: 8, nr_irqs: 1096 > [ 0.181888] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 > [ 0.181889] cjq_debug __irq_alloc_descs irq: 9, from: 9, cnt: 1, node: -1, start: 9, nr_irqs: 1096 > [ 0.181918] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 > [ 0.181919] cjq_debug __irq_alloc_descs irq: 10, from: 10, cnt: 1, node: -1, start: 10, nr_irqs: 1096 > [ 0.181950] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 > [ 0.181951] cjq_debug __irq_alloc_descs irq: 11, from: 11, cnt: 1, node: -1, start: 11, nr_irqs: 1096 > [ 0.181977] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 > [ 0.181979] cjq_debug __irq_alloc_descs irq: 12, from: 12, cnt: 1, node: -1, start: 12, nr_irqs: 1096 > [ 0.182006] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 > [ 0.182007] cjq_debug __irq_alloc_descs irq: 13, from: 13, cnt: 1, node: -1, start: 13, nr_irqs: 1096 > [ 0.182034] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 > [ 0.182035] cjq_debug __irq_alloc_descs irq: 14, from: 14, cnt: 1, node: -1, start: 14, nr_irqs: 1096 > [ 0.182066] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 > [ 0.182067] cjq_debug __irq_alloc_descs irq: 15, from: 15, cnt: 1, node: -1, start: 15, nr_irqs: 1096 > [ 0.182095] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 > [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 > [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 > [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 > [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 > [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 > [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 > [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 > [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 > [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 > [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 > [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 > [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 > [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 > [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 > [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 > [ 0.198199] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 > [ 0.198416] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 > [ 0.198460] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 > [ 0.198489] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 > [ 0.198523] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 > [ 0.201315] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 > [ 0.202174] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 > [ 0.202225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 > [ 0.202259] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 > [ 0.202291] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 > [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 > [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 > [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 > [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 > [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 > [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 63, nr_irqs: 1096 > [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 64, nr_irqs: 1096 > [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 > [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 66, nr_irqs: 1096 > [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 67, nr_irqs: 1096 > [ 0.210169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 68, nr_irqs: 1096 > [ 0.210322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 69, nr_irqs: 1096 > [ 0.210370] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 70, nr_irqs: 1096 > [ 0.210403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 71, nr_irqs: 1096 > [ 0.210436] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 72, nr_irqs: 1096 > [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 73, nr_irqs: 1096 > [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 74, nr_irqs: 1096 > [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 75, nr_irqs: 1096 > [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 76, nr_irqs: 1096 > [ 0.214151] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 77, nr_irqs: 1096 > [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 78, nr_irqs: 1096 > [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 79, nr_irqs: 1096 > [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 80, nr_irqs: 1096 > [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 81, nr_irqs: 1096 > [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 > [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 > [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 > [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 > [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 > [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 > [ 0.222215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 > [ 0.222366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 > [ 0.222410] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 90, nr_irqs: 1096 > [ 0.222447] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 91, nr_irqs: 1096 > [ 0.222478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 92, nr_irqs: 1096 > [ 0.225490] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 93, nr_irqs: 1096 > [ 0.226225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 94, nr_irqs: 1096 > [ 0.226268] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 95, nr_irqs: 1096 > [ 0.226300] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 96, nr_irqs: 1096 > [ 0.226329] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 97, nr_irqs: 1096 > [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 98, nr_irqs: 1096 > [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 99, nr_irqs: 1096 > [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 100, nr_irqs: 1096 > [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 101, nr_irqs: 1096 > [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 102, nr_irqs: 1096 > [ 0.232399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 103, nr_irqs: 1096 > [ 0.248854] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 104, nr_irqs: 1096 > [ 0.250609] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 105, nr_irqs: 1096 > [ 0.372343] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 > [ 0.720950] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 > [ 0.721052] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 > [ 1.254825] cjq_debug mp_map_pin_to_irq gsi: 7, irq: -16, idx: 7, ioapic: 0, pin: 7 > [ 1.333081] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 > [ 1.375882] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 106, nr_irqs: 1096 > [ 1.375951] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 106, idx: -1, ioapic: 1, pin: 8 > [ 1.376072] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 > [ 1.376121] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 > [ 1.472551] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 > [ 1.472697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 > [ 1.472751] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 > [ 1.484290] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 > [ 1.768163] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 > [ 1.768627] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 108, nr_irqs: 1096 > [ 1.769059] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 109, nr_irqs: 1096 > [ 1.769694] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 110, nr_irqs: 1096 > [ 1.770169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 111, nr_irqs: 1096 > [ 1.770697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 112, nr_irqs: 1096 > [ 1.770738] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 > [ 1.770789] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 113, nr_irqs: 1096 > [ 1.771230] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 > [ 1.771278] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 114, nr_irqs: 1096 > [ 2.127884] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 115, nr_irqs: 1096 > [ 3.207419] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 116, nr_irqs: 1096 > [ 3.207730] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 > [ 3.208120] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 117, nr_irqs: 1096 > [ 3.208475] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 > [ 3.208478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 118, nr_irqs: 1096 > [ 3.208861] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 > [ 3.208933] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 119, nr_irqs: 1096 > [ 3.209127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 120, nr_irqs: 1096 > [ 3.209383] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 121, nr_irqs: 1096 > [ 3.209863] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 122, nr_irqs: 1096 > [ 3.211439] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 123, nr_irqs: 1096 > [ 3.211833] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 124, nr_irqs: 1096 > [ 3.212873] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 125, nr_irqs: 1096 > [ 3.243514] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 126, nr_irqs: 1096 > [ 3.243689] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 126, idx: -1, ioapic: 1, pin: 14 > [ 3.244293] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 127, nr_irqs: 1096 > [ 3.244534] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 128, nr_irqs: 1096 > [ 3.244714] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 129, nr_irqs: 1096 > [ 3.244911] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 130, nr_irqs: 1096 > [ 3.245096] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 131, nr_irqs: 1096 > [ 3.245633] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 132, nr_irqs: 1096 > [ 3.247890] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 133, nr_irqs: 1096 > [ 3.248192] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 134, nr_irqs: 1096 > [ 3.271093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 135, nr_irqs: 1096 > [ 3.307045] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 136, nr_irqs: 1096 > [ 3.307162] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 136, idx: -1, ioapic: 1, pin: 24 > [ 3.307223] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 > [ 3.331183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 > [ 3.331295] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 138, nr_irqs: 1096 > [ 3.331366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 139, nr_irqs: 1096 > [ 3.331438] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 140, nr_irqs: 1096 > [ 3.331511] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 141, nr_irqs: 1096 > [ 3.331579] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 142, nr_irqs: 1096 > [ 3.331646] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 143, nr_irqs: 1096 > [ 3.331713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 144, nr_irqs: 1096 > [ 3.331780] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 145, nr_irqs: 1096 > [ 3.331846] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 146, nr_irqs: 1096 > [ 3.331913] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 147, nr_irqs: 1096 > [ 3.331984] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 148, nr_irqs: 1096 > [ 3.332051] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 149, nr_irqs: 1096 > [ 3.332118] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 150, nr_irqs: 1096 > [ 3.332183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 151, nr_irqs: 1096 > [ 3.332252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 152, nr_irqs: 1096 > [ 3.332319] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 153, nr_irqs: 1096 > [ 8.010370] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 > [ 9.545439] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 > [ 9.545713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 154, nr_irqs: 1096 > [ 9.546034] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 155, nr_irqs: 1096 > [ 9.687796] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 156, nr_irqs: 1096 > [ 9.687979] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 > [ 9.688057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 157, nr_irqs: 1096 > [ 9.921038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 158, nr_irqs: 1096 > [ 9.921210] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 158, idx: -1, ioapic: 1, pin: 5 > [ 9.921403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 159, nr_irqs: 1096 > [ 9.926373] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 > [ 9.926747] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 160, nr_irqs: 1096 > [ 9.928201] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 > [ 9.928488] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 161, nr_irqs: 1096 > [ 10.653915] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 162, nr_irqs: 1096 > [ 10.656257] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 163, nr_irqs: 1096 > > You can find that the allocation of irq is not always based on the value of gsi. It follows the principle of requesting first, distributing first, like gsi 32 get 106 but gsi 28 get 112. And not only acpi_register_gsi_ioapic() will call into __irq_alloc_descs, but other functions will call, even earlier. > Above output is like baremetal. So, we can get conclusion irq != gsi. See below output on linux: It does seem weird to me that it does identity map legacy IRQs (<16), but then for GSI >= 16 it starts assigning IRQs in the 100 range. What uses the IRQ range [24, 105]? Also IIRC on a PV dom0 GSIs are identity mapped to IRQs on Linux? Or maybe that's just a side effect of GSIs being identity mapped into PIRQs by Xen? > [ 0.105053] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 > [ 0.105061] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 0, idx: 0, ioapic: 0, pin: 2 > [ 0.105069] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 > [ 0.105078] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 > [ 0.105086] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 > [ 0.105094] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 > [ 0.105103] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 > [ 0.105111] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 > [ 0.105119] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 > [ 0.105127] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 > [ 0.105136] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 > [ 0.105144] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 > [ 0.105152] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 > [ 0.105160] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 > [ 0.105169] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 > [ 0.398134] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 > [ 1.169293] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 > [ 1.169394] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 > [ 1.323132] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 > [ 1.345425] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 > [ 1.375502] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 > [ 1.375575] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 24, idx: -1, ioapic: 1, pin: 8 > [ 1.375661] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 > [ 1.375705] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 > [ 1.442277] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 > [ 1.442393] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 > [ 1.442450] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 > [ 1.453893] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 > [ 1.456127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 > [ 1.734065] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 > [ 1.734165] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 > [ 1.734253] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 > [ 1.734344] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 > [ 1.734426] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 > [ 1.734512] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 > [ 1.734597] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 > [ 1.734643] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 > [ 1.734687] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 > [ 1.734728] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 > [ 1.735017] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 > [ 1.735252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 > [ 1.735467] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 > [ 1.735799] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 > [ 1.736024] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 > [ 1.736364] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 > [ 1.736406] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 > [ 1.736434] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 > [ 1.736701] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 > [ 1.736724] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 > [ 3.037123] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 > [ 3.037313] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 > [ 3.037515] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 > [ 3.037738] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 > [ 3.037959] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 > [ 3.038073] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 > [ 3.038154] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 > [ 3.038179] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 > [ 3.038277] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 > [ 3.038399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 > [ 3.038525] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 > [ 3.038657] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 > [ 3.038852] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 > [ 3.052377] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 > [ 3.052479] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 54, idx: -1, ioapic: 1, pin: 14 > [ 3.052730] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 > [ 3.052840] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 > [ 3.052918] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 > [ 3.052987] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 > [ 3.053069] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 > [ 3.053139] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 > [ 3.053201] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 > [ 3.053260] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 > [ 3.089128] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 63, nr_irqs: 1096 > [ 3.089310] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 63, idx: -1, ioapic: 1, pin: 24 > [ 3.089376] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 > [ 3.103435] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 > [ 3.114190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 > [ 3.114346] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 66, nr_irqs: 1096 > [ 3.121215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 67, nr_irqs: 1096 > [ 3.121350] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 68, nr_irqs: 1096 > [ 3.121479] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 69, nr_irqs: 1096 > [ 3.121612] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 70, nr_irqs: 1096 > [ 3.121726] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 71, nr_irqs: 1096 > [ 3.121841] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 72, nr_irqs: 1096 > [ 3.121955] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 73, nr_irqs: 1096 > [ 3.122025] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 74, nr_irqs: 1096 > [ 3.122093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 75, nr_irqs: 1096 > [ 3.122148] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 76, nr_irqs: 1096 > [ 3.122203] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 77, nr_irqs: 1096 > [ 3.122265] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 78, nr_irqs: 1096 > [ 3.122322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 79, nr_irqs: 1096 > [ 3.122378] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 80, nr_irqs: 1096 > [ 3.122433] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 81, nr_irqs: 1096 > [ 7.838753] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 > [ 9.619174] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 > [ 9.619556] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 > [ 9.622038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 > [ 9.634900] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 > [ 9.635316] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 > [ 9.635405] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 > [ 10.006686] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 > [ 10.006823] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 86, idx: -1, ioapic: 1, pin: 5 > [ 10.007009] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 > [ 10.008723] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 > [ 10.009853] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 > [ 10.010786] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 > [ 10.010858] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 > > 2. Why I do the translations between irq and gsi? > > After answering question 1, we get irq != gsi. And I found, in QEMU, (pci_qdev_realize->xen_pt_realize->xen_host_pci_device_get->xen_host_pci_get_hex_value) will get the irq number, but later, pci_qdev_realize->xen_pt_realize->xc_physdev_map_pirq requires us to pass into gsi, So that's quite a difference. For some reason on a PV dom0 xen_host_pci_get_hex_value will return the IRQ that's identity mapped to the GSI. Is that because a PV dom0 will use acpi_register_gsi_xen() instead of acpi_register_gsi_ioapic()? > it will call into Xen physdev_map_pirq-> allocate_and_map_gsi_pirq to allocate pirq for gsi. And then the error occurred. > Not only that, the callback function pci_add_dm_done-> xc_physdev_map_pirq also need gsi. > > So, I added the function xc_physdev_map_pirq() to translate irq to gsi, for QEMU. > > And I didn't find similar functions in existing linux codes, and I think only "QEMU passthrough for Xen" need this translation, so I added it into privcmd. If you guys know any other similar functions or other more suitable places, please feel free to tell me. > > 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? > > Because if you want to map a gsi for domU, it must have a mapping in dom0. See QEMU code: > pci_add_dm_done > xc_physdev_map_pirq > xc_domain_irq_permission > XEN_DOMCTL_irq_permission > pirq_access_permitted > xc_physdev_map_pirq will get the pirq which mapped from gsi, and xc_domain_irq_permission will use pirq and call into Xen. If we don't do PHYSDEVOP_map_pirq for passthrough devices on PVH dom0, then pirq_access_permitted will get a NULL irq from dom0 and get failed. I'm not sure of this specific case, but we shouldn't attempt to fit the same exact PCI pass through workflow that a PV dom0 uses into a PVH dom0. IOW: it might make sense to diverge some paths in order to avoid importing PV specific concepts into PVH without a reason. > So, I added PHYSDEVOP_map_pirq for PVH dom0. But I think it is only necessary for passthrough devices to do that, instead of all devices which call __acpi_register_gsi. In next version patch, I will restrain that only passthrough devices can do PHYSDEVOP_map_pirq. > > 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? > > Like Roger's comments, the gsi of passthrough device doesn't be unmasked and registered(I added printings in vioapic_hwdom_map_gsi(), and I found that it never be called for dGPU with gsi 28 in my environment). > So, I called PHYSDEVOP_setup_gsi to register gsi. > But I agree with Roger and Jan's opinion, it is wrong to do PHYSDEVOP_setup_gsi for all devices. > So, in next version patch, I will also restrain that only passthrough devices can do PHYSDEVOP_setup_gsi. Right, given how long it's been since the last series, I think we need a new series posted in order to see how this looks now. Thanks, Roger.
Thanks Roger, we will send a new series after the freezing time of Xen release 4.18. On 2023/8/23 16:57, Roger Pau Monné wrote: > On Mon, Jul 31, 2023 at 04:40:35PM +0000, Chen, Jiqian wrote: >> Hi, >> >> On 2023/3/18 04:55, Stefano Stabellini wrote: >>> On Fri, 17 Mar 2023, Roger Pau Monné wrote: >>>> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: >>>>> On Fri, 17 Mar 2023, Roger Pau Monné wrote: >>>>>> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: >>>>>>> On 17.03.2023 00:19, Stefano Stabellini wrote: >>>>>>>> On Thu, 16 Mar 2023, Jan Beulich wrote: >>>>>>>>> So yes, it then all boils down to that Linux- >>>>>>>>> internal question. >>>>>>>> >>>>>>>> Excellent question but we'll have to wait for Ray as he is the one with >>>>>>>> access to the hardware. But I have this data I can share in the >>>>>>>> meantime: >>>>>>>> >>>>>>>> [ 1.260378] IRQ to pin mappings: >>>>>>>> [ 1.260387] IRQ1 -> 0:1 >>>>>>>> [ 1.260395] IRQ2 -> 0:2 >>>>>>>> [ 1.260403] IRQ3 -> 0:3 >>>>>>>> [ 1.260410] IRQ4 -> 0:4 >>>>>>>> [ 1.260418] IRQ5 -> 0:5 >>>>>>>> [ 1.260425] IRQ6 -> 0:6 >>>>>>>> [ 1.260432] IRQ7 -> 0:7 >>>>>>>> [ 1.260440] IRQ8 -> 0:8 >>>>>>>> [ 1.260447] IRQ9 -> 0:9 >>>>>>>> [ 1.260455] IRQ10 -> 0:10 >>>>>>>> [ 1.260462] IRQ11 -> 0:11 >>>>>>>> [ 1.260470] IRQ12 -> 0:12 >>>>>>>> [ 1.260478] IRQ13 -> 0:13 >>>>>>>> [ 1.260485] IRQ14 -> 0:14 >>>>>>>> [ 1.260493] IRQ15 -> 0:15 >>>>>>>> [ 1.260505] IRQ106 -> 1:8 >>>>>>>> [ 1.260513] IRQ112 -> 1:4 >>>>>>>> [ 1.260521] IRQ116 -> 1:13 >>>>>>>> [ 1.260529] IRQ117 -> 1:14 >>>>>>>> [ 1.260537] IRQ118 -> 1:15 >>>>>>>> [ 1.260544] .................................... done. >>>>>>> >>>>>>> And what does Linux think are IRQs 16 ... 105? Have you compared with >>>>>>> Linux running baremetal on the same hardware? >>>>>> >>>>>> So I have some emails from Ray from he time he was looking into this, >>>>>> and on Linux dom0 PVH dmesg there is: >>>>>> >>>>>> [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 >>>>>> [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 >>>>>> >>>>>> So it seems the vIO-APIC data provided by Xen to dom0 is at least >>>>>> consistent. >>>>>> >>>>>>>> And I think Ray traced the point in Linux where Linux gives us an IRQ == >>>>>>>> 112 (which is the one causing issues): >>>>>>>> >>>>>>>> __acpi_register_gsi-> >>>>>>>> acpi_register_gsi_ioapic-> >>>>>>>> mp_map_gsi_to_irq-> >>>>>>>> mp_map_pin_to_irq-> >>>>>>>> __irq_resolve_mapping() >>>>>>>> >>>>>>>> if (likely(data)) { >>>>>>>> desc = irq_data_to_desc(data); >>>>>>>> if (irq) >>>>>>>> *irq = data->irq; >>>>>>>> /* this IRQ is 112, IO-APIC-34 domain */ >>>>>>>> } >>>>>> >>>>>> >>>>>> Could this all be a result of patch 4/5 in the Linux series ("[RFC >>>>>> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different >>>>>> __acpi_register_gsi hook is installed for PVH in order to setup GSIs >>>>>> using PHYSDEV ops instead of doing it natively from the IO-APIC? >>>>>> >>>>>> FWIW, the introduced function in that patch >>>>>> (acpi_register_gsi_xen_pvh()) seems to unconditionally call >>>>>> acpi_register_gsi_ioapic() without checking if the GSI is already >>>>>> registered, which might lead to multiple IRQs being allocated for the >>>>>> same underlying GSI? >>>>> >>>>> I understand this point and I think it needs investigating. >>>>> >>>>> >>>>>> As I commented there, I think that approach is wrong. If the GSI has >>>>>> not been mapped in Xen (because dom0 hasn't unmasked the respective >>>>>> IO-APIC pin) we should add some logic in the toolstack to map it >>>>>> before attempting to bind. >>>>> >>>>> But this statement confuses me. The toolstack doesn't get involved in >>>>> IRQ setup for PCI devices for HVM guests? >>>> >>>> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call >>>> to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that >>>> cold be removed (maybe for qemu-trad only?) or it's also required by >>>> QEMU upstream, I would have to investigate more. >>> >>> You are right. I am not certain, but it seems like a mistake in the >>> toolstack to me. In theory, pci_add_dm_done should only be needed for PV >>> guests, not for HVM guests. I am not sure. But I can see the call to >>> xc_physdev_map_pirq you were referring to now. >>> >>> >>>> It's my understanding it's in pci_add_dm_done() where Ray was getting >>>> the mismatched IRQ vs GSI number. >>> >>> I think the mismatch was actually caused by the xc_physdev_map_pirq call >>> from QEMU, which makes sense because in any case it should happen before >>> the same call done by pci_add_dm_done (pci_add_dm_done is called after >>> sending the pci passthrough QMP command to QEMU). So the first to hit >>> the IRQ!=GSI problem would be QEMU. >> >> >> Sorry for replying to you so late. And thank you all for review. I realized that your questions mainly focus on the following points: 1. Why irq is not equal with gsi? 2. Why I do the translations between irq and gsi? 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? >> Please forgive me for making a summary response first. And I am looking forward to your comments. > > Sorry, it's been a bit since that conversation, so my recollection is > vague. > > One of the questions was why acpi_register_gsi_xen_pvh() is needed. I > think the patch that introduced it on Linux didn't have much of a > commit description. PVH and baremetal both use acpi_register_gsi_ioapic to alloc irq for gsi. And I add function acpi_register_gsi_xen_pvh to replace acpi_register_gsi_ioapic for PVH, and then I can do something special for PVH, like map_pirq, setup_gsi, etc. > >> 1. Why irq is not equal with gsi? >> As far as I know, irq is dynamically allocated according to gsi, they are not necessarily equal. >> When I run "sudo xl pci-assignable-add 03:00.0" to assign passthrough device(Taking dGPU on my environment as an example, which gsi is 28). It will call into acpi_register_gsi_ioapic to get irq, the callstack is: >> acpi_register_gsi_ioapic >> mp_map_gsi_to_irq >> mp_map_pin_to_irq >> irq_find_mapping(if gsi has been mapped to an irq before, it will return corresponding irq here) >> alloc_irq_from_domain >> __irq_domain_alloc_irqs >> irq_domain_alloc_descs >> __irq_alloc_descs > > Won't you perform double GSI registrations with Xen if both > acpi_register_gsi_ioapic() and acpi_register_gsi_xen_pvh() are used? In the original PVH code, __acpi_register_gsi is set acpi_register_gsi_ioapic in callstack start_kernel->setup_arch->acpi_boot_init->acpi_process_madt->acpi_set_irq_model_ioapic. In my code, I use acpi_register_gsi_xen_pvh to replace acpi_register_gsi_ioapic in call stack start_kernel-> init_IRQ-> xen_init_IRQ-> pci_xen_pvh_init. So acpi_register_gsi_ioapic will be called only once. > >> >> If you add some printings like below: >> --------------------------------------------------------------------------------------------------------------------------------------------- >> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c >> index a868b76cd3d4..970fd461be7a 100644 >> --- a/arch/x86/kernel/apic/io_apic.c >> +++ b/arch/x86/kernel/apic/io_apic.c >> @@ -1067,6 +1067,8 @@ static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin, >> } >> } >> mutex_unlock(&ioapic_mutex); >> + printk("cjq_debug mp_map_pin_to_irq gsi: %u, irq: %d, idx: %d, ioapic: %d, pin: %d\n", >> + gsi, irq, idx, ioapic, pin); >> >> return irq; >> } >> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c >> index 5db0230aa6b5..4e9613abbe96 100644 >> --- a/kernel/irq/irqdesc.c >> +++ b/kernel/irq/irqdesc.c >> @@ -786,6 +786,8 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, >> start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS, >> from, cnt, 0); >> ret = -EEXIST; >> + printk("cjq_debug __irq_alloc_descs irq: %d, from: %u, cnt: %u, node: %d, start: %d, nr_irqs: %d\n", >> + irq, from, cnt, node, start, nr_irqs); >> if (irq >=0 && start != irq) >> goto unlock; >> --------------------------------------------------------------------------------------------------------------------------------------------- >> You will get output on PVH dom0: >> >> [ 0.181560] cjq_debug __irq_alloc_descs irq: 1, from: 1, cnt: 1, node: -1, start: 1, nr_irqs: 1096 >> [ 0.181639] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 >> [ 0.181641] cjq_debug __irq_alloc_descs irq: 2, from: 2, cnt: 1, node: -1, start: 2, nr_irqs: 1096 >> [ 0.181682] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 2, idx: 0, ioapic: 0, pin: 2 >> [ 0.181683] cjq_debug __irq_alloc_descs irq: 3, from: 3, cnt: 1, node: -1, start: 3, nr_irqs: 1096 >> [ 0.181715] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 >> [ 0.181716] cjq_debug __irq_alloc_descs irq: 4, from: 4, cnt: 1, node: -1, start: 4, nr_irqs: 1096 >> [ 0.181751] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 >> [ 0.181752] cjq_debug __irq_alloc_descs irq: 5, from: 5, cnt: 1, node: -1, start: 5, nr_irqs: 1096 >> [ 0.181783] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 >> [ 0.181784] cjq_debug __irq_alloc_descs irq: 6, from: 6, cnt: 1, node: -1, start: 6, nr_irqs: 1096 >> [ 0.181813] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 >> [ 0.181814] cjq_debug __irq_alloc_descs irq: 7, from: 7, cnt: 1, node: -1, start: 7, nr_irqs: 1096 >> [ 0.181856] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 >> [ 0.181857] cjq_debug __irq_alloc_descs irq: 8, from: 8, cnt: 1, node: -1, start: 8, nr_irqs: 1096 >> [ 0.181888] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 >> [ 0.181889] cjq_debug __irq_alloc_descs irq: 9, from: 9, cnt: 1, node: -1, start: 9, nr_irqs: 1096 >> [ 0.181918] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 >> [ 0.181919] cjq_debug __irq_alloc_descs irq: 10, from: 10, cnt: 1, node: -1, start: 10, nr_irqs: 1096 >> [ 0.181950] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 >> [ 0.181951] cjq_debug __irq_alloc_descs irq: 11, from: 11, cnt: 1, node: -1, start: 11, nr_irqs: 1096 >> [ 0.181977] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 >> [ 0.181979] cjq_debug __irq_alloc_descs irq: 12, from: 12, cnt: 1, node: -1, start: 12, nr_irqs: 1096 >> [ 0.182006] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 >> [ 0.182007] cjq_debug __irq_alloc_descs irq: 13, from: 13, cnt: 1, node: -1, start: 13, nr_irqs: 1096 >> [ 0.182034] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 >> [ 0.182035] cjq_debug __irq_alloc_descs irq: 14, from: 14, cnt: 1, node: -1, start: 14, nr_irqs: 1096 >> [ 0.182066] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 >> [ 0.182067] cjq_debug __irq_alloc_descs irq: 15, from: 15, cnt: 1, node: -1, start: 15, nr_irqs: 1096 >> [ 0.182095] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 >> [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 >> [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 >> [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 >> [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 >> [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 >> [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 >> [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 >> [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 >> [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 >> [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 >> [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 >> [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 >> [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 >> [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 >> [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 >> [ 0.198199] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 >> [ 0.198416] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 >> [ 0.198460] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 >> [ 0.198489] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 >> [ 0.198523] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 >> [ 0.201315] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 >> [ 0.202174] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 >> [ 0.202225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 >> [ 0.202259] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 >> [ 0.202291] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 >> [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 >> [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 >> [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 >> [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 >> [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 >> [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 63, nr_irqs: 1096 >> [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 64, nr_irqs: 1096 >> [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 >> [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 66, nr_irqs: 1096 >> [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 67, nr_irqs: 1096 >> [ 0.210169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 68, nr_irqs: 1096 >> [ 0.210322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 69, nr_irqs: 1096 >> [ 0.210370] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 70, nr_irqs: 1096 >> [ 0.210403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 71, nr_irqs: 1096 >> [ 0.210436] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 72, nr_irqs: 1096 >> [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 73, nr_irqs: 1096 >> [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 74, nr_irqs: 1096 >> [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 75, nr_irqs: 1096 >> [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 76, nr_irqs: 1096 >> [ 0.214151] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 77, nr_irqs: 1096 >> [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 78, nr_irqs: 1096 >> [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 79, nr_irqs: 1096 >> [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 80, nr_irqs: 1096 >> [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 81, nr_irqs: 1096 >> [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 >> [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 >> [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 >> [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 >> [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 >> [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 >> [ 0.222215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 >> [ 0.222366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 >> [ 0.222410] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 90, nr_irqs: 1096 >> [ 0.222447] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 91, nr_irqs: 1096 >> [ 0.222478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 92, nr_irqs: 1096 >> [ 0.225490] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 93, nr_irqs: 1096 >> [ 0.226225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 94, nr_irqs: 1096 >> [ 0.226268] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 95, nr_irqs: 1096 >> [ 0.226300] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 96, nr_irqs: 1096 >> [ 0.226329] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 97, nr_irqs: 1096 >> [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 98, nr_irqs: 1096 >> [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 99, nr_irqs: 1096 >> [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 100, nr_irqs: 1096 >> [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 101, nr_irqs: 1096 >> [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 102, nr_irqs: 1096 >> [ 0.232399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 103, nr_irqs: 1096 >> [ 0.248854] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 104, nr_irqs: 1096 >> [ 0.250609] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 105, nr_irqs: 1096 >> [ 0.372343] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 >> [ 0.720950] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 >> [ 0.721052] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 >> [ 1.254825] cjq_debug mp_map_pin_to_irq gsi: 7, irq: -16, idx: 7, ioapic: 0, pin: 7 >> [ 1.333081] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 >> [ 1.375882] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 106, nr_irqs: 1096 >> [ 1.375951] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 106, idx: -1, ioapic: 1, pin: 8 >> [ 1.376072] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 >> [ 1.376121] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 >> [ 1.472551] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 >> [ 1.472697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 >> [ 1.472751] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 >> [ 1.484290] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 >> [ 1.768163] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 >> [ 1.768627] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 108, nr_irqs: 1096 >> [ 1.769059] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 109, nr_irqs: 1096 >> [ 1.769694] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 110, nr_irqs: 1096 >> [ 1.770169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 111, nr_irqs: 1096 >> [ 1.770697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 112, nr_irqs: 1096 >> [ 1.770738] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 >> [ 1.770789] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 113, nr_irqs: 1096 >> [ 1.771230] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 >> [ 1.771278] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 114, nr_irqs: 1096 >> [ 2.127884] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 115, nr_irqs: 1096 >> [ 3.207419] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 116, nr_irqs: 1096 >> [ 3.207730] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 >> [ 3.208120] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 117, nr_irqs: 1096 >> [ 3.208475] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 >> [ 3.208478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 118, nr_irqs: 1096 >> [ 3.208861] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 >> [ 3.208933] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 119, nr_irqs: 1096 >> [ 3.209127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 120, nr_irqs: 1096 >> [ 3.209383] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 121, nr_irqs: 1096 >> [ 3.209863] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 122, nr_irqs: 1096 >> [ 3.211439] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 123, nr_irqs: 1096 >> [ 3.211833] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 124, nr_irqs: 1096 >> [ 3.212873] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 125, nr_irqs: 1096 >> [ 3.243514] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 126, nr_irqs: 1096 >> [ 3.243689] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 126, idx: -1, ioapic: 1, pin: 14 >> [ 3.244293] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 127, nr_irqs: 1096 >> [ 3.244534] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 128, nr_irqs: 1096 >> [ 3.244714] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 129, nr_irqs: 1096 >> [ 3.244911] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 130, nr_irqs: 1096 >> [ 3.245096] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 131, nr_irqs: 1096 >> [ 3.245633] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 132, nr_irqs: 1096 >> [ 3.247890] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 133, nr_irqs: 1096 >> [ 3.248192] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 134, nr_irqs: 1096 >> [ 3.271093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 135, nr_irqs: 1096 >> [ 3.307045] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 136, nr_irqs: 1096 >> [ 3.307162] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 136, idx: -1, ioapic: 1, pin: 24 >> [ 3.307223] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 >> [ 3.331183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 >> [ 3.331295] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 138, nr_irqs: 1096 >> [ 3.331366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 139, nr_irqs: 1096 >> [ 3.331438] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 140, nr_irqs: 1096 >> [ 3.331511] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 141, nr_irqs: 1096 >> [ 3.331579] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 142, nr_irqs: 1096 >> [ 3.331646] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 143, nr_irqs: 1096 >> [ 3.331713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 144, nr_irqs: 1096 >> [ 3.331780] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 145, nr_irqs: 1096 >> [ 3.331846] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 146, nr_irqs: 1096 >> [ 3.331913] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 147, nr_irqs: 1096 >> [ 3.331984] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 148, nr_irqs: 1096 >> [ 3.332051] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 149, nr_irqs: 1096 >> [ 3.332118] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 150, nr_irqs: 1096 >> [ 3.332183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 151, nr_irqs: 1096 >> [ 3.332252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 152, nr_irqs: 1096 >> [ 3.332319] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 153, nr_irqs: 1096 >> [ 8.010370] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 >> [ 9.545439] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 >> [ 9.545713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 154, nr_irqs: 1096 >> [ 9.546034] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 155, nr_irqs: 1096 >> [ 9.687796] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 156, nr_irqs: 1096 >> [ 9.687979] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 >> [ 9.688057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 157, nr_irqs: 1096 >> [ 9.921038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 158, nr_irqs: 1096 >> [ 9.921210] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 158, idx: -1, ioapic: 1, pin: 5 >> [ 9.921403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 159, nr_irqs: 1096 >> [ 9.926373] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 >> [ 9.926747] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 160, nr_irqs: 1096 >> [ 9.928201] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 >> [ 9.928488] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 161, nr_irqs: 1096 >> [ 10.653915] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 162, nr_irqs: 1096 >> [ 10.656257] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 163, nr_irqs: 1096 >> >> You can find that the allocation of irq is not always based on the value of gsi. It follows the principle of requesting first, distributing first, like gsi 32 get 106 but gsi 28 get 112. And not only acpi_register_gsi_ioapic() will call into __irq_alloc_descs, but other functions will call, even earlier. >> Above output is like baremetal. So, we can get conclusion irq != gsi. See below output on linux: > > It does seem weird to me that it does identity map legacy IRQs (<16), > but then for GSI >= 16 it starts assigning IRQs in the 100 range. > > What uses the IRQ range [24, 105]? They are allocated to the ipi, msi or event channel. They call __irq_alloc_descs before the pci devices. For example, see one ipi's callstack: kernel_init kernel_init_freeable smp_prepare_cpus smp_ops.smp_prepare_cpus xen_hvm_smp_prepare_cpus xen_smp_intr_init bind_ipi_to_irqhandler bind_ipi_to_irq xen_allocate_irq_dynamic __irq_alloc_descs > > Also IIRC on a PV dom0 GSIs are identity mapped to IRQs on Linux? Or > maybe that's just a side effect of GSIs being identity mapped into > PIRQs by Xen? PV is different, although, ipi also will come before pci devices, they don't occupy the irq(24~56). Because in PV dom0, it doesn't call setup_IO_APIC when start_kernel, so variable "ioapic_initialized" in function arch_dynirq_lower_bound is not initialized, and then gsi_top whose value is 56 is returned, the irq allocation begins from 56 number(but PVH and baremetal will initialize "ioapic_initialized", and then arch_dynirq_lower_bound will return ioapic_dynirq_base whose value is 24). What's more, when PV allocates irq for a pci device, it call acpi_register_gsi_xen->irq_alloc_desc_at->__irq_alloc_descs, function irq_alloc_desc_at send gsi to __irq_alloc_descs(PVH and baremetal send -1), so in function __irq_alloc_descs, variable "from" is equal gsi, and gsi is between 24~56, and 24~56's irq are not occupied before. Then it returns a irq that equal gsi. > >> [ 0.105053] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 >> [ 0.105061] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 0, idx: 0, ioapic: 0, pin: 2 >> [ 0.105069] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 >> [ 0.105078] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 >> [ 0.105086] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 >> [ 0.105094] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 >> [ 0.105103] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 >> [ 0.105111] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 >> [ 0.105119] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 >> [ 0.105127] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 >> [ 0.105136] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 >> [ 0.105144] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 >> [ 0.105152] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 >> [ 0.105160] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 >> [ 0.105169] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 >> [ 0.398134] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 >> [ 1.169293] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 >> [ 1.169394] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 >> [ 1.323132] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 >> [ 1.345425] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 >> [ 1.375502] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 >> [ 1.375575] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 24, idx: -1, ioapic: 1, pin: 8 >> [ 1.375661] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 >> [ 1.375705] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 >> [ 1.442277] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 >> [ 1.442393] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 >> [ 1.442450] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 >> [ 1.453893] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 >> [ 1.456127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 >> [ 1.734065] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 >> [ 1.734165] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 >> [ 1.734253] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 >> [ 1.734344] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 >> [ 1.734426] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 >> [ 1.734512] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 >> [ 1.734597] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 >> [ 1.734643] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 >> [ 1.734687] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 >> [ 1.734728] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 >> [ 1.735017] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 >> [ 1.735252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 >> [ 1.735467] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 >> [ 1.735799] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 >> [ 1.736024] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 >> [ 1.736364] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 >> [ 1.736406] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 >> [ 1.736434] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 >> [ 1.736701] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 >> [ 1.736724] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 >> [ 3.037123] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 >> [ 3.037313] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 >> [ 3.037515] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 >> [ 3.037738] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 >> [ 3.037959] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 >> [ 3.038073] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 >> [ 3.038154] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 >> [ 3.038179] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 >> [ 3.038277] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 >> [ 3.038399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 >> [ 3.038525] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 >> [ 3.038657] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 >> [ 3.038852] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 >> [ 3.052377] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 >> [ 3.052479] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 54, idx: -1, ioapic: 1, pin: 14 >> [ 3.052730] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 >> [ 3.052840] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 >> [ 3.052918] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 >> [ 3.052987] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 >> [ 3.053069] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 >> [ 3.053139] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 >> [ 3.053201] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 >> [ 3.053260] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 >> [ 3.089128] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 63, nr_irqs: 1096 >> [ 3.089310] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 63, idx: -1, ioapic: 1, pin: 24 >> [ 3.089376] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 >> [ 3.103435] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 >> [ 3.114190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 >> [ 3.114346] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 66, nr_irqs: 1096 >> [ 3.121215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 67, nr_irqs: 1096 >> [ 3.121350] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 68, nr_irqs: 1096 >> [ 3.121479] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 69, nr_irqs: 1096 >> [ 3.121612] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 70, nr_irqs: 1096 >> [ 3.121726] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 71, nr_irqs: 1096 >> [ 3.121841] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 72, nr_irqs: 1096 >> [ 3.121955] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 73, nr_irqs: 1096 >> [ 3.122025] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 74, nr_irqs: 1096 >> [ 3.122093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 75, nr_irqs: 1096 >> [ 3.122148] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 76, nr_irqs: 1096 >> [ 3.122203] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 77, nr_irqs: 1096 >> [ 3.122265] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 78, nr_irqs: 1096 >> [ 3.122322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 79, nr_irqs: 1096 >> [ 3.122378] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 80, nr_irqs: 1096 >> [ 3.122433] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 81, nr_irqs: 1096 >> [ 7.838753] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 >> [ 9.619174] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 >> [ 9.619556] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 >> [ 9.622038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 >> [ 9.634900] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 >> [ 9.635316] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 >> [ 9.635405] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 >> [ 10.006686] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 >> [ 10.006823] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 86, idx: -1, ioapic: 1, pin: 5 >> [ 10.007009] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 >> [ 10.008723] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 >> [ 10.009853] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 >> [ 10.010786] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 >> [ 10.010858] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 >> >> 2. Why I do the translations between irq and gsi? >> >> After answering question 1, we get irq != gsi. And I found, in QEMU, (pci_qdev_realize->xen_pt_realize->xen_host_pci_device_get->xen_host_pci_get_hex_value) will get the irq number, but later, pci_qdev_realize->xen_pt_realize->xc_physdev_map_pirq requires us to pass into gsi, > > So that's quite a difference. For some reason on a PV dom0 > xen_host_pci_get_hex_value will return the IRQ that's identity mapped > to the GSI. > > Is that because a PV dom0 will use acpi_register_gsi_xen() instead of > acpi_register_gsi_ioapic()? Not right, PV get irq from /sys/bus/pci/devices/xxxx:xx:xx.x/irq, see xen_pt_realize-> xen_host_pci_device_get-> xen_host_pci_get_dec_value-> xen_host_pci_get_value-> open, and it treats irq as gsi. > >> it will call into Xen physdev_map_pirq-> allocate_and_map_gsi_pirq to allocate pirq for gsi. And then the error occurred. >> Not only that, the callback function pci_add_dm_done-> xc_physdev_map_pirq also need gsi. >> >> So, I added the function xc_physdev_map_pirq() to translate irq to gsi, for QEMU. >> >> And I didn't find similar functions in existing linux codes, and I think only "QEMU passthrough for Xen" need this translation, so I added it into privcmd. If you guys know any other similar functions or other more suitable places, please feel free to tell me. >> >> 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? >> >> Because if you want to map a gsi for domU, it must have a mapping in dom0. See QEMU code: >> pci_add_dm_done >> xc_physdev_map_pirq >> xc_domain_irq_permission >> XEN_DOMCTL_irq_permission >> pirq_access_permitted >> xc_physdev_map_pirq will get the pirq which mapped from gsi, and xc_domain_irq_permission will use pirq and call into Xen. If we don't do PHYSDEVOP_map_pirq for passthrough devices on PVH dom0, then pirq_access_permitted will get a NULL irq from dom0 and get failed. > > I'm not sure of this specific case, but we shouldn't attempt to fit > the same exact PCI pass through workflow that a PV dom0 uses into a > PVH dom0. IOW: it might make sense to diverge some paths in order to > avoid importing PV specific concepts into PVH without a reason. Yes, I agree with you. I also try another method to solve this problem. I think we can discuss this in the new series. > >> So, I added PHYSDEVOP_map_pirq for PVH dom0. But I think it is only necessary for passthrough devices to do that, instead of all devices which call __acpi_register_gsi. In next version patch, I will restrain that only passthrough devices can do PHYSDEVOP_map_pirq. >> >> 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? >> >> Like Roger's comments, the gsi of passthrough device doesn't be unmasked and registered(I added printings in vioapic_hwdom_map_gsi(), and I found that it never be called for dGPU with gsi 28 in my environment). >> So, I called PHYSDEVOP_setup_gsi to register gsi. >> But I agree with Roger and Jan's opinion, it is wrong to do PHYSDEVOP_setup_gsi for all devices. >> So, in next version patch, I will also restrain that only passthrough devices can do PHYSDEVOP_setup_gsi. > > Right, given how long it's been since the last series, I think we need > a new series posted in order to see how this looks now. Agree, I am looking forward to getting your comments in the new series. > > Thanks, Roger.
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index f4c4f17545..47cf2799bf 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, goto out_no_irq; } if ((fscanf(f, "%u", &irq) == 1) && irq) { + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq); if (r < 0) { LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",