Message ID | c4c582970fbeaf4b6000845c400aa4c6b7bb2f13.1682615447.git.reinette.chatre@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/pci: Support dynamic allocation of MSI-X interrupts | expand |
> From: Chatre, Reinette <reinette.chatre@intel.com> > Sent: Friday, April 28, 2023 1:36 AM > > pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be > allocated after MSI-X enabling. > > Use dynamic MSI-X (if supported by the device) to allocate an interrupt > after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at > the time a valid eventfd is assigned. This is different behavior from > a range provided during MSI-X enabling where interrupts are allocated > for the entire range whether a valid eventfd is provided for each > interrupt or not. > > The PCI-MSIX API requires that some number of irqs are allocated for > an initial set of vectors when enabling MSI-X on the device. When > dynamic MSIX allocation is not supported, the vector table, and thus > the allocated irq set can only be resized by disabling and re-enabling > MSI-X with a different range. In that case the irq allocation is > essentially a cache for configuring vectors within the previously > allocated vector range. When dynamic MSI-X allocation is supported, > the API still requires some initial set of irqs to be allocated, but > also supports allocating and freeing specific irq vectors both > within and beyond the initially allocated range. > > For consistency between modes, as well as to reduce latency and improve > reliability of allocations, and also simplicity, this implementation > only releases irqs via pci_free_irq_vectors() when either the interrupt > mode changes or the device is released. It improves the reliability of allocations from the calling device p.o.v. But system-wide this is not efficient use of irqs and not releasing them timely may affect the reliability of allocations for other devices. Should this behavior be something configurable? > > +/* > + * Return Linux IRQ number of an MSI or MSI-X device interrupt vector. > + * If a Linux IRQ number is not available then a new interrupt will be > + * allocated if dynamic MSI-X is supported. > + */ > +static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev, > + unsigned int vector, bool msix) > +{ > + struct pci_dev *pdev = vdev->pdev; > + struct msi_map map; > + int irq; > + u16 cmd; > + > + irq = pci_irq_vector(pdev, vector); > + if (irq > 0 || !msix || !vdev->has_dyn_msix) > + return irq; if (irq >= 0 || ...) > + > +/* > + * Where is vfio_msi_free_irq() ? > + * > + * Allocated interrupts are maintained, essentially forming a cache that > + * subsequent allocations can draw from. Interrupts are freed using > + * pci_free_irq_vectors() when MSI/MSI-X is disabled. > + */ Probably merge it with the comment of vfio_msi_alloc_irq()? > @@ -401,6 +430,12 @@ static int vfio_msi_set_vector_signal(struct > vfio_pci_core_device *vdev, > if (fd < 0) > return 0; > > + if (irq == -EINVAL) { > + irq = vfio_msi_alloc_irq(vdev, vector, msix); > + if (irq < 0) > + return irq; > + } > + > ctx = vfio_irq_ctx_alloc(vdev, vector); > if (!ctx) > return -ENOMEM; This doesn't read clean that an irq is allocated but not released in the error unwind.
Hi Kevin, On 4/27/2023 11:50 PM, Tian, Kevin wrote: >> From: Chatre, Reinette <reinette.chatre@intel.com> >> Sent: Friday, April 28, 2023 1:36 AM >> >> pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be >> allocated after MSI-X enabling. >> >> Use dynamic MSI-X (if supported by the device) to allocate an interrupt >> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at >> the time a valid eventfd is assigned. This is different behavior from >> a range provided during MSI-X enabling where interrupts are allocated >> for the entire range whether a valid eventfd is provided for each >> interrupt or not. >> >> The PCI-MSIX API requires that some number of irqs are allocated for >> an initial set of vectors when enabling MSI-X on the device. When >> dynamic MSIX allocation is not supported, the vector table, and thus >> the allocated irq set can only be resized by disabling and re-enabling >> MSI-X with a different range. In that case the irq allocation is >> essentially a cache for configuring vectors within the previously >> allocated vector range. When dynamic MSI-X allocation is supported, >> the API still requires some initial set of irqs to be allocated, but >> also supports allocating and freeing specific irq vectors both >> within and beyond the initially allocated range. >> >> For consistency between modes, as well as to reduce latency and improve >> reliability of allocations, and also simplicity, this implementation >> only releases irqs via pci_free_irq_vectors() when either the interrupt >> mode changes or the device is released. > > It improves the reliability of allocations from the calling device p.o.v. > > But system-wide this is not efficient use of irqs and not releasing them > timely may affect the reliability of allocations for other devices. Could you please elaborate how other devices may be impacted? > Should this behavior be something configurable? This is not clear to me and I look to you for guidance here. From practical side it looks like configuration via module parameters is supported but whether it should be done is not clear to me. When considering this we need to think about what the user may expect when turning on/off the configuration. For example, MSI-X continues to allocate a range of interrupts during enabling. These have always been treated as a "cache" (interrupts remain allocated, whether they have an associated trigger or not). If there is new configurable behavior, do you expect that the driver needs to distinguish between the original "cache" that the user is used to and the new dynamic allocations? That is, should a dynamic MSI-X capable device always free interrupts when user space removes an eventfd or should only interrupts that were allocated dynamically be freed dynamically? >> +/* >> + * Return Linux IRQ number of an MSI or MSI-X device interrupt vector. >> + * If a Linux IRQ number is not available then a new interrupt will be >> + * allocated if dynamic MSI-X is supported. >> + */ >> +static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev, >> + unsigned int vector, bool msix) >> +{ >> + struct pci_dev *pdev = vdev->pdev; >> + struct msi_map map; >> + int irq; >> + u16 cmd; >> + >> + irq = pci_irq_vector(pdev, vector); >> + if (irq > 0 || !msix || !vdev->has_dyn_msix) >> + return irq; > > if (irq >= 0 || ...) > I am not sure about this request because pci_irq_vector() cannot return 0. The Linux interrupt number will be > 0 on success. 0 means "not found" (see msi_get_virq()), which is translated to -EINVAL by pci_irq_vector(). >> + >> +/* >> + * Where is vfio_msi_free_irq() ? >> + * >> + * Allocated interrupts are maintained, essentially forming a cache that >> + * subsequent allocations can draw from. Interrupts are freed using >> + * pci_free_irq_vectors() when MSI/MSI-X is disabled. >> + */ > > Probably merge it with the comment of vfio_msi_alloc_irq()? Sure, will do. > >> @@ -401,6 +430,12 @@ static int vfio_msi_set_vector_signal(struct >> vfio_pci_core_device *vdev, >> if (fd < 0) >> return 0; >> >> + if (irq == -EINVAL) { >> + irq = vfio_msi_alloc_irq(vdev, vector, msix); >> + if (irq < 0) >> + return irq; >> + } >> + >> ctx = vfio_irq_ctx_alloc(vdev, vector); >> if (!ctx) >> return -ENOMEM; > > This doesn't read clean that an irq is allocated but not released > in the error unwind. I can add a comment similar to the location where the trigger is released: /* Interrupt stays allocated, will be freed at MSI-X disable. */ Reinette
> From: Chatre, Reinette <reinette.chatre@intel.com> > Sent: Saturday, April 29, 2023 2:35 AM > > Hi Kevin, > > On 4/27/2023 11:50 PM, Tian, Kevin wrote: > >> From: Chatre, Reinette <reinette.chatre@intel.com> > >> Sent: Friday, April 28, 2023 1:36 AM > >> > >> pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be > >> allocated after MSI-X enabling. > >> > >> Use dynamic MSI-X (if supported by the device) to allocate an interrupt > >> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at > >> the time a valid eventfd is assigned. This is different behavior from > >> a range provided during MSI-X enabling where interrupts are allocated > >> for the entire range whether a valid eventfd is provided for each > >> interrupt or not. > >> > >> The PCI-MSIX API requires that some number of irqs are allocated for > >> an initial set of vectors when enabling MSI-X on the device. When > >> dynamic MSIX allocation is not supported, the vector table, and thus > >> the allocated irq set can only be resized by disabling and re-enabling > >> MSI-X with a different range. In that case the irq allocation is > >> essentially a cache for configuring vectors within the previously > >> allocated vector range. When dynamic MSI-X allocation is supported, > >> the API still requires some initial set of irqs to be allocated, but > >> also supports allocating and freeing specific irq vectors both > >> within and beyond the initially allocated range. > >> > >> For consistency between modes, as well as to reduce latency and improve > >> reliability of allocations, and also simplicity, this implementation > >> only releases irqs via pci_free_irq_vectors() when either the interrupt > >> mode changes or the device is released. > > > > It improves the reliability of allocations from the calling device p.o.v. > > > > But system-wide this is not efficient use of irqs and not releasing them > > timely may affect the reliability of allocations for other devices. > > Could you please elaborate how other devices may be impacted? the more this devices reserves the less remains for others, e.g. irte entries. > > > Should this behavior be something configurable? > > This is not clear to me and I look to you for guidance here. From practical > side it looks like configuration via module parameters is supported but > whether it should be done is not clear to me. > > When considering this we need to think about what the user may expect > when > turning on/off the configuration. For example, MSI-X continues to allocate a > range of interrupts during enabling. These have always been treated as a > "cache" (interrupts remain allocated, whether they have an associated > trigger > or not). If there is new configurable behavior, do you expect that the > driver needs to distinguish between the original "cache" that the user is > used to and the new dynamic allocations? That is, should a dynamic MSI-X > capable device always free interrupts when user space removes an eventfd > or should only interrupts that were allocated dynamically be freed > dynamically? That looks tricky. Probably that is why Alex suggested doing this simple scheme and it is on par with the old logic anyway. So I'll withdraw this comment. > > >> +/* > >> + * Return Linux IRQ number of an MSI or MSI-X device interrupt vector. > >> + * If a Linux IRQ number is not available then a new interrupt will be > >> + * allocated if dynamic MSI-X is supported. > >> + */ > >> +static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev, > >> + unsigned int vector, bool msix) > >> +{ > >> + struct pci_dev *pdev = vdev->pdev; > >> + struct msi_map map; > >> + int irq; > >> + u16 cmd; > >> + > >> + irq = pci_irq_vector(pdev, vector); > >> + if (irq > 0 || !msix || !vdev->has_dyn_msix) > >> + return irq; > > > > if (irq >= 0 || ...) > > > > I am not sure about this request because pci_irq_vector() cannot return 0. > The Linux interrupt number will be > 0 on success. 0 means "not found" > (see msi_get_virq()), which is translated to -EINVAL by pci_irq_vector(). > There is a subtle difference between the description and the code of pci_irq_vector(). /** * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector * @dev: the PCI device to operate on * @nr: device-relative interrupt vector index (0-based); has different * meanings, depending on interrupt mode: * * * MSI-X the index in the MSI-X vector table * * MSI the index of the enabled MSI vectors * * INTx must be 0 * * Return: the Linux IRQ number, or -EINVAL if @nr is out of range */ From above '0' is a valid irq number. then in following code: irq = msi_get_virq(&dev->dev, nr); return irq ? irq : -EINVAL; '0' is obviously invalid for msi. I didn't realize the msi part when reading the patch. It left me in confusion that '0' is unhandled as here we only check ">0" while in other places "-EINVAL" is checked. Not big matter but it sounds slightly clearer to me to follow the description of pci_irq_vector() instead of its internal detail.
On Fri, 5 May 2023 08:10:33 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Chatre, Reinette <reinette.chatre@intel.com> > > Sent: Saturday, April 29, 2023 2:35 AM > > > > Hi Kevin, > > > > On 4/27/2023 11:50 PM, Tian, Kevin wrote: > > >> From: Chatre, Reinette <reinette.chatre@intel.com> > > >> Sent: Friday, April 28, 2023 1:36 AM > > >> > > >> pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be > > >> allocated after MSI-X enabling. > > >> > > >> Use dynamic MSI-X (if supported by the device) to allocate an interrupt > > >> after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at > > >> the time a valid eventfd is assigned. This is different behavior from > > >> a range provided during MSI-X enabling where interrupts are allocated > > >> for the entire range whether a valid eventfd is provided for each > > >> interrupt or not. > > >> > > >> The PCI-MSIX API requires that some number of irqs are allocated for > > >> an initial set of vectors when enabling MSI-X on the device. When > > >> dynamic MSIX allocation is not supported, the vector table, and thus > > >> the allocated irq set can only be resized by disabling and re-enabling > > >> MSI-X with a different range. In that case the irq allocation is > > >> essentially a cache for configuring vectors within the previously > > >> allocated vector range. When dynamic MSI-X allocation is supported, > > >> the API still requires some initial set of irqs to be allocated, but > > >> also supports allocating and freeing specific irq vectors both > > >> within and beyond the initially allocated range. > > >> > > >> For consistency between modes, as well as to reduce latency and improve > > >> reliability of allocations, and also simplicity, this implementation > > >> only releases irqs via pci_free_irq_vectors() when either the interrupt > > >> mode changes or the device is released. > > > > > > It improves the reliability of allocations from the calling device p.o.v. > > > > > > But system-wide this is not efficient use of irqs and not releasing them > > > timely may affect the reliability of allocations for other devices. > > > > Could you please elaborate how other devices may be impacted? > > the more this devices reserves the less remains for others, e.g. irte entries. > > > > > > Should this behavior be something configurable? > > > > This is not clear to me and I look to you for guidance here. From practical > > side it looks like configuration via module parameters is supported but > > whether it should be done is not clear to me. > > > > When considering this we need to think about what the user may expect > > when > > turning on/off the configuration. For example, MSI-X continues to allocate a > > range of interrupts during enabling. These have always been treated as a > > "cache" (interrupts remain allocated, whether they have an associated > > trigger > > or not). If there is new configurable behavior, do you expect that the > > driver needs to distinguish between the original "cache" that the user is > > used to and the new dynamic allocations? That is, should a dynamic MSI-X > > capable device always free interrupts when user space removes an eventfd > > or should only interrupts that were allocated dynamically be freed > > dynamically? > > That looks tricky. Probably that is why Alex suggested doing this simple > scheme and it is on par with the old logic anyway. So I'll withdraw this > comment. Don't forget we're also releasing the irq reservations when the guest changes interrupt mode, ex. reboot, so the "caching" is really only within a session of the guest/userspace driver where it would be unusual to have an unused reservation for an extended period. Thanks, Alex
Hi Kevin, On 5/5/2023 1:10 AM, Tian, Kevin wrote: >> From: Chatre, Reinette <reinette.chatre@intel.com> >> Sent: Saturday, April 29, 2023 2:35 AM >> On 4/27/2023 11:50 PM, Tian, Kevin wrote: >>>> From: Chatre, Reinette <reinette.chatre@intel.com> >>>> Sent: Friday, April 28, 2023 1:36 AM ... >>>> +/* >>>> + * Return Linux IRQ number of an MSI or MSI-X device interrupt vector. >>>> + * If a Linux IRQ number is not available then a new interrupt will be >>>> + * allocated if dynamic MSI-X is supported. >>>> + */ >>>> +static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev, >>>> + unsigned int vector, bool msix) >>>> +{ >>>> + struct pci_dev *pdev = vdev->pdev; >>>> + struct msi_map map; >>>> + int irq; >>>> + u16 cmd; >>>> + >>>> + irq = pci_irq_vector(pdev, vector); >>>> + if (irq > 0 || !msix || !vdev->has_dyn_msix) >>>> + return irq; >>> >>> if (irq >= 0 || ...) >>> >> >> I am not sure about this request because pci_irq_vector() cannot return 0. >> The Linux interrupt number will be > 0 on success. 0 means "not found" >> (see msi_get_virq()), which is translated to -EINVAL by pci_irq_vector(). >> > > There is a subtle difference between the description and the code of > pci_irq_vector(). > > /** > * pci_irq_vector() - Get Linux IRQ number of a device interrupt vector > * @dev: the PCI device to operate on > * @nr: device-relative interrupt vector index (0-based); has different > * meanings, depending on interrupt mode: > * > * * MSI-X the index in the MSI-X vector table > * * MSI the index of the enabled MSI vectors > * * INTx must be 0 > * > * Return: the Linux IRQ number, or -EINVAL if @nr is out of range > */ > > From above '0' is a valid irq number. > > then in following code: > > irq = msi_get_virq(&dev->dev, nr); > return irq ? irq : -EINVAL; > > '0' is obviously invalid for msi. > > I didn't realize the msi part when reading the patch. It left me in > confusion that '0' is unhandled as here we only check ">0" while in > other places "-EINVAL" is checked. > > Not big matter but it sounds slightly clearer to me to follow the > description of pci_irq_vector() instead of its internal detail. I can add an explicit check for '0' and, as you confirmed, this is invalid for MSI and thus I think it should be treated as an error. This is perhaps another candidate for a WARN considering that pci_irq_vector() returning a '0' for MSI indicates a kernel problem . I now consider taking guidance from pci_irq_get_affinity(). Note that pci_irq_get_affinity() contains: const struct cpumask *pci_irq_get_affinity(struct pci_dev *dev, int nr) { int idx, irq = pci_irq_vector(dev, nr); ... if (WARN_ON_ONCE(irq <= 0)) return NULL; ... } Would you be ok with something like below? diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index b549f5c97cb8..a8e96254f953 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -393,6 +393,8 @@ static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev, u16 cmd; irq = pci_irq_vector(pdev, vector); + if (WARN_ON_ONCE(irq == 0)) + return -EINVAL; if (irq > 0 || !msix || !vdev->has_dyn_msix) return irq; I would prefer that vfio_msi_alloc_irq() returns negative errors. This enables callers to in turn just return the error code on failure (note that dynamic allocation can return different error codes), not needing to translate 0 into an error. Reinette
> From: Chatre, Reinette <reinette.chatre@intel.com> > Sent: Saturday, May 6, 2023 1:21 AM > > Would you be ok with something like below? > > diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c > index b549f5c97cb8..a8e96254f953 100644 > --- a/drivers/vfio/pci/vfio_pci_intrs.c > +++ b/drivers/vfio/pci/vfio_pci_intrs.c > @@ -393,6 +393,8 @@ static int vfio_msi_alloc_irq(struct > vfio_pci_core_device *vdev, > u16 cmd; > > irq = pci_irq_vector(pdev, vector); > + if (WARN_ON_ONCE(irq == 0)) > + return -EINVAL; > if (irq > 0 || !msix || !vdev->has_dyn_msix) > return irq; > > I would prefer that vfio_msi_alloc_irq() returns negative errors. This enables > callers to in turn just return the error code on failure (note that dynamic > allocation can return different error codes), not needing to translate 0 into > an error. > This looks good to me. Thanks.
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Friday, May 5, 2023 11:28 PM > > On Fri, 5 May 2023 08:10:33 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > From: Chatre, Reinette <reinette.chatre@intel.com> > > > Sent: Saturday, April 29, 2023 2:35 AM > > > > > > Hi Kevin, > > > > > > On 4/27/2023 11:50 PM, Tian, Kevin wrote: > > > > > > > Should this behavior be something configurable? > > > > > > This is not clear to me and I look to you for guidance here. From practical > > > side it looks like configuration via module parameters is supported but > > > whether it should be done is not clear to me. > > > > > > When considering this we need to think about what the user may expect > > > when > > > turning on/off the configuration. For example, MSI-X continues to > allocate a > > > range of interrupts during enabling. These have always been treated as a > > > "cache" (interrupts remain allocated, whether they have an associated > > > trigger > > > or not). If there is new configurable behavior, do you expect that the > > > driver needs to distinguish between the original "cache" that the user is > > > used to and the new dynamic allocations? That is, should a dynamic MSI- > X > > > capable device always free interrupts when user space removes an > eventfd > > > or should only interrupts that were allocated dynamically be freed > > > dynamically? > > > > That looks tricky. Probably that is why Alex suggested doing this simple > > scheme and it is on par with the old logic anyway. So I'll withdraw this > > comment. > > Don't forget we're also releasing the irq reservations when the guest > changes interrupt mode, ex. reboot, so the "caching" is really only > within a session of the guest/userspace driver where it would be > unusual to have an unused reservation for an extended period. Thanks, > Yeah, that makes sense.
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index bdda7f46c2be..8340135b09fa 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -372,27 +372,56 @@ static int vfio_msi_enable(struct vfio_pci_core_device *vdev, int nvec, bool msi return 0; } +/* + * Return Linux IRQ number of an MSI or MSI-X device interrupt vector. + * If a Linux IRQ number is not available then a new interrupt will be + * allocated if dynamic MSI-X is supported. + */ +static int vfio_msi_alloc_irq(struct vfio_pci_core_device *vdev, + unsigned int vector, bool msix) +{ + struct pci_dev *pdev = vdev->pdev; + struct msi_map map; + int irq; + u16 cmd; + + irq = pci_irq_vector(pdev, vector); + if (irq > 0 || !msix || !vdev->has_dyn_msix) + return irq; + + cmd = vfio_pci_memory_lock_and_enable(vdev); + map = pci_msix_alloc_irq_at(pdev, vector, NULL); + vfio_pci_memory_unlock_and_restore(vdev, cmd); + + return map.index < 0 ? map.index : map.virq; +} + +/* + * Where is vfio_msi_free_irq() ? + * + * Allocated interrupts are maintained, essentially forming a cache that + * subsequent allocations can draw from. Interrupts are freed using + * pci_free_irq_vectors() when MSI/MSI-X is disabled. + */ + static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev, unsigned int vector, int fd, bool msix) { struct pci_dev *pdev = vdev->pdev; struct vfio_pci_irq_ctx *ctx; struct eventfd_ctx *trigger; - int irq, ret; + int irq = -EINVAL, ret; u16 cmd; - irq = pci_irq_vector(pdev, vector); - if (irq < 0) - return -EINVAL; - ctx = vfio_irq_ctx_get(vdev, vector); if (ctx) { irq_bypass_unregister_producer(&ctx->producer); - + irq = pci_irq_vector(pdev, vector); cmd = vfio_pci_memory_lock_and_enable(vdev); free_irq(irq, ctx->trigger); vfio_pci_memory_unlock_and_restore(vdev, cmd); + /* Interrupt stays allocated, will be freed at MSI-X disable. */ kfree(ctx->name); eventfd_ctx_put(ctx->trigger); vfio_irq_ctx_free(vdev, ctx, vector); @@ -401,6 +430,12 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_core_device *vdev, if (fd < 0) return 0; + if (irq == -EINVAL) { + irq = vfio_msi_alloc_irq(vdev, vector, msix); + if (irq < 0) + return irq; + } + ctx = vfio_irq_ctx_alloc(vdev, vector); if (!ctx) return -ENOMEM;
pci_msix_alloc_irq_at() enables an individual MSI-X interrupt to be allocated after MSI-X enabling. Use dynamic MSI-X (if supported by the device) to allocate an interrupt after MSI-X is enabled. An MSI-X interrupt is dynamically allocated at the time a valid eventfd is assigned. This is different behavior from a range provided during MSI-X enabling where interrupts are allocated for the entire range whether a valid eventfd is provided for each interrupt or not. The PCI-MSIX API requires that some number of irqs are allocated for an initial set of vectors when enabling MSI-X on the device. When dynamic MSIX allocation is not supported, the vector table, and thus the allocated irq set can only be resized by disabling and re-enabling MSI-X with a different range. In that case the irq allocation is essentially a cache for configuring vectors within the previously allocated vector range. When dynamic MSI-X allocation is supported, the API still requires some initial set of irqs to be allocated, but also supports allocating and freeing specific irq vectors both within and beyond the initially allocated range. For consistency between modes, as well as to reduce latency and improve reliability of allocations, and also simplicity, this implementation only releases irqs via pci_free_irq_vectors() when either the interrupt mode changes or the device is released. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Link: https://lore.kernel.org/lkml/20230403211841.0e206b67.alex.williamson@redhat.com/ --- Changes since V3: - Remove vfio_msi_free_irq(). (Alex) - Rework changelog. (Alex) Changes since V2: - Move vfio_irq_ctx_free() to earlier in series to support earlier usage. (Alex) - Use consistent terms in changelog: MSI-x changed to MSI-X. - Make dynamic interrupt context creation generic across all MSI/MSI-X interrupts. This resulted in code moving to earlier in series as part of xarray introduction patch. (Alex) - Remove the local allow_dyn_alloc and direct calling of pci_msix_can_alloc_dyn(), use the new vdev->has_dyn_msix introduced earlier instead. (Alex) - Stop tracking new allocations (remove "new_ctx"). (Alex) - Introduce new wrapper that returns Linux interrupt number or dynamically allocate a new interrupt. Wrapper can be used for all interrupt cases. (Alex) - Only free dynamic MSI-X interrupts on MSI-X teardown. (Alex) Changes since RFC V1: - Add pointer to interrupt context as function parameter to vfio_irq_ctx_free(). (Alex) - Initialize new_ctx to false. (Dan Carpenter) - Only support dynamic allocation if device supports it. (Alex) drivers/vfio/pci/vfio_pci_intrs.c | 47 +++++++++++++++++++++++++++---- 1 file changed, 41 insertions(+), 6 deletions(-)