diff mbox series

[v6,1/3] x86/msi: harden stale pdev handling

Message ID 20241018203913.1162962-2-stewart.hildebrand@amd.com (mailing list archive)
State Superseded
Headers show
Series xen: SR-IOV fixes | expand

Commit Message

Stewart Hildebrand Oct. 18, 2024, 8:39 p.m. UTC
Dom0 normally informs Xen of PCI device removal via
PHYSDEVOP_pci_device_remove, e.g. in response to SR-IOV disable or
hot-unplug. We might find ourselves with stale pdevs if a buggy dom0
fails to report removal via PHYSDEVOP_pci_device_remove. In this case,
attempts to access the config space of the stale pdevs would be invalid
and return all 1s.

Some possible conditions leading to this are:

1. Dom0 disables SR-IOV without reporting VF removal to Xen.

The Linux SR-IOV subsystem normally reports VF removal when a PF driver
disables SR-IOV. In case of a buggy dom0 SR-IOV subsystem, SR-IOV could
become disabled with stale dangling VF pdevs in both dom0 Linux and Xen.

2. Dom0 reporting PF removal without reporting VF removal.

During SR-IOV PF removal (hot-unplug), a buggy PF driver may fail to
disable SR-IOV, thus failing to remove the VFs, leaving stale dangling
VFs behind in both Xen and Linux. At least Linux warns in this case:

[  100.000000]  0000:01:00.0: driver left SR-IOV enabled after remove

In either case, Xen is left with stale VF pdevs, risking invalid PCI
config space accesses.

When Xen is built with CONFIG_DEBUG=y, the following Xen crashes were
observed when dom0 attempted to access the config space of a stale VF:

(XEN) Assertion 'pos' failed at arch/x86/msi.c:1274
(XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Tainted:   C    ]----
...
(XEN) Xen call trace:
(XEN)    [<ffff82d040346834>] R pci_msi_conf_write_intercept+0xa2/0x1de
(XEN)    [<ffff82d04035d6b4>] F pci_conf_write_intercept+0x68/0x78
(XEN)    [<ffff82d0403264e5>] F arch/x86/pv/emul-priv-op.c#pci_cfg_ok+0xa0/0x114
(XEN)    [<ffff82d04032660e>] F arch/x86/pv/emul-priv-op.c#guest_io_write+0xb5/0x1c8
(XEN)    [<ffff82d0403267bb>] F arch/x86/pv/emul-priv-op.c#write_io+0x9a/0xe0
(XEN)    [<ffff82d04037c77a>] F x86_emulate+0x100e5/0x25f1e
(XEN)    [<ffff82d0403941a8>] F x86_emulate_wrapper+0x29/0x64
(XEN)    [<ffff82d04032802b>] F pv_emulate_privileged_op+0x12e/0x217
(XEN)    [<ffff82d040369f12>] F do_general_protection+0xc2/0x1b8
(XEN)    [<ffff82d040201aa7>] F x86_64/entry.S#handle_exception_saved+0x2b/0x8c

(XEN) Assertion 'pos' failed at arch/x86/msi.c:1246
(XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Tainted:   C    ]----
...
(XEN) Xen call trace:
(XEN)    [<ffff82d040346b0a>] R pci_reset_msix_state+0x47/0x50
(XEN)    [<ffff82d040287eec>] F pdev_msix_assign+0x19/0x35
(XEN)    [<ffff82d040286184>] F drivers/passthrough/pci.c#assign_device+0x181/0x471
(XEN)    [<ffff82d040287c36>] F iommu_do_pci_domctl+0x248/0x2ec
(XEN)    [<ffff82d040284e1f>] F iommu_do_domctl+0x26/0x44
(XEN)    [<ffff82d0402483b8>] F do_domctl+0x8c1/0x1660
(XEN)    [<ffff82d04032977e>] F pv_hypercall+0x5ce/0x6af
(XEN)    [<ffff82d0402012d3>] F lstar_enter+0x143/0x150

These ASSERTs triggered because the MSI-X capability position can't be
found for a stale pdev.

Latch the capability positions of MSI and MSI-X during device init, and
replace instances of pci_find_cap_offset(..., PCI_CAP_ID_MSI{,X}) with
the stored value. Introduce one additional ASSERT, while the two
existing ASSERTs in question continue to work as intended, even with a
stale pdev.

Fixes: 484d7c852e4f ("x86/MSI-X: track host and guest mask-all requests separately")
Fixes: 575e18d54d19 ("pci: clear {host/guest}_maskall field on assign")
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
---
v5->v6;
* latch MSI/MSI-X capability position during device init

v4->v5:
* new patch, independent of the rest of the series
* new approach to fixing the issue: don't rely on dom0 to report any
  sort of device removal; rather, fix the condition directly

---
Instructions to reproduce
Requires Xen with CONFIG_DEBUG=y
Tested with Linux 6.11

1. Dom0 disables SR-IOV without reporting VF removal to Xen.

* Hack the Linux SR-IOV subsystem to remove the call to
  pci_stop_and_remove_bus_device() in
  drivers/pci/iov.c:pci_iov_remove_virtfn().

* Enable SR-IOV, then disable SR-IOV
  echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs
  echo 0 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs

* Now we have a stale VF. We can trigger the ASSERT either by unbinding
  the VF driver and issuing a reset...

  echo 0000\:01\:10.0 > /sys/bus/pci/devices/0000\:01\:10.0/driver/unbind
  echo 1 > /sys/bus/pci/devices/0000\:01\:10.0/reset

  ... or by doing xl pci-assignable-add

  xl pci-assignable-add 01:10.0

2. Dom0 reporting PF removal without reporting VF removal.

* Hack your PF driver to leave SR-IOV enabled when removing the device

* Enable SR-IOV

  echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/sriov_numvfs

* Unplug the PCI device

  (qemu) device_del mydev

* Now we have a stale VF. We can trigger the ASSERT either by re-adding
  the PF device with SR-IOV disabled...

  echo 0000\:01\:10.0 > /sys/bus/pci/devices/0000\:01\:10.0/driver/unbind
  (qemu) device_add igb,id=mydev,bus=pcie.1,netdev=net1

  ... or by reset / xl pci-assignable-add as above.
---
 xen/arch/x86/msi.c            | 19 +++++++++----------
 xen/drivers/passthrough/msi.c |  3 +++
 xen/drivers/vpci/msi.c        |  2 +-
 xen/drivers/vpci/msix.c       |  2 +-
 xen/include/xen/pci.h         |  3 +++
 5 files changed, 17 insertions(+), 12 deletions(-)

Comments

Jan Beulich Oct. 28, 2024, 4:58 p.m. UTC | #1
On 18.10.2024 22:39, Stewart Hildebrand wrote:
> Dom0 normally informs Xen of PCI device removal via
> PHYSDEVOP_pci_device_remove, e.g. in response to SR-IOV disable or
> hot-unplug. We might find ourselves with stale pdevs if a buggy dom0
> fails to report removal via PHYSDEVOP_pci_device_remove. In this case,
> attempts to access the config space of the stale pdevs would be invalid
> and return all 1s.
> 
> Some possible conditions leading to this are:
> 
> 1. Dom0 disables SR-IOV without reporting VF removal to Xen.
> 
> The Linux SR-IOV subsystem normally reports VF removal when a PF driver
> disables SR-IOV. In case of a buggy dom0 SR-IOV subsystem, SR-IOV could
> become disabled with stale dangling VF pdevs in both dom0 Linux and Xen.
> 
> 2. Dom0 reporting PF removal without reporting VF removal.
> 
> During SR-IOV PF removal (hot-unplug), a buggy PF driver may fail to
> disable SR-IOV, thus failing to remove the VFs, leaving stale dangling
> VFs behind in both Xen and Linux. At least Linux warns in this case:
> 
> [  100.000000]  0000:01:00.0: driver left SR-IOV enabled after remove
> 
> In either case, Xen is left with stale VF pdevs, risking invalid PCI
> config space accesses.
> 
> When Xen is built with CONFIG_DEBUG=y, the following Xen crashes were
> observed when dom0 attempted to access the config space of a stale VF:
> 
> (XEN) Assertion 'pos' failed at arch/x86/msi.c:1274
> (XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Tainted:   C    ]----
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d040346834>] R pci_msi_conf_write_intercept+0xa2/0x1de
> (XEN)    [<ffff82d04035d6b4>] F pci_conf_write_intercept+0x68/0x78
> (XEN)    [<ffff82d0403264e5>] F arch/x86/pv/emul-priv-op.c#pci_cfg_ok+0xa0/0x114
> (XEN)    [<ffff82d04032660e>] F arch/x86/pv/emul-priv-op.c#guest_io_write+0xb5/0x1c8
> (XEN)    [<ffff82d0403267bb>] F arch/x86/pv/emul-priv-op.c#write_io+0x9a/0xe0
> (XEN)    [<ffff82d04037c77a>] F x86_emulate+0x100e5/0x25f1e
> (XEN)    [<ffff82d0403941a8>] F x86_emulate_wrapper+0x29/0x64
> (XEN)    [<ffff82d04032802b>] F pv_emulate_privileged_op+0x12e/0x217
> (XEN)    [<ffff82d040369f12>] F do_general_protection+0xc2/0x1b8
> (XEN)    [<ffff82d040201aa7>] F x86_64/entry.S#handle_exception_saved+0x2b/0x8c
> 
> (XEN) Assertion 'pos' failed at arch/x86/msi.c:1246
> (XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Tainted:   C    ]----
> ...
> (XEN) Xen call trace:
> (XEN)    [<ffff82d040346b0a>] R pci_reset_msix_state+0x47/0x50
> (XEN)    [<ffff82d040287eec>] F pdev_msix_assign+0x19/0x35
> (XEN)    [<ffff82d040286184>] F drivers/passthrough/pci.c#assign_device+0x181/0x471
> (XEN)    [<ffff82d040287c36>] F iommu_do_pci_domctl+0x248/0x2ec
> (XEN)    [<ffff82d040284e1f>] F iommu_do_domctl+0x26/0x44
> (XEN)    [<ffff82d0402483b8>] F do_domctl+0x8c1/0x1660
> (XEN)    [<ffff82d04032977e>] F pv_hypercall+0x5ce/0x6af
> (XEN)    [<ffff82d0402012d3>] F lstar_enter+0x143/0x150
> 
> These ASSERTs triggered because the MSI-X capability position can't be
> found for a stale pdev.
> 
> Latch the capability positions of MSI and MSI-X during device init, and
> replace instances of pci_find_cap_offset(..., PCI_CAP_ID_MSI{,X}) with
> the stored value. Introduce one additional ASSERT, while the two
> existing ASSERTs in question continue to work as intended, even with a
> stale pdev.
> 
> Fixes: 484d7c852e4f ("x86/MSI-X: track host and guest mask-all requests separately")
> Fixes: 575e18d54d19 ("pci: clear {host/guest}_maskall field on assign")
> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>

Looks largely okay to me now, just two type selection aspects:

> --- a/xen/arch/x86/msi.c
> +++ b/xen/arch/x86/msi.c
> @@ -278,23 +278,21 @@ void __msi_set_enable(u16 seg, u8 bus, u8 slot, u8 func, int pos, int enable)
>  
>  static void msi_set_enable(struct pci_dev *dev, int enable)
>  {
> -    int pos;
> +    int pos = dev->msi_pos;

This and ...

>      u16 seg = dev->seg;
>      u8 bus = dev->bus;
>      u8 slot = PCI_SLOT(dev->devfn);
>      u8 func = PCI_FUNC(dev->devfn);
>  
> -    pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSI);
>      if ( pos )
>          __msi_set_enable(seg, bus, slot, func, pos, enable);
>  }
>  
>  static void msix_set_enable(struct pci_dev *dev, int enable)
>  {
> -    int pos;
> +    int pos = dev->msix_pos;

... this want to become unsigned int at this occasion, imo. Like we have ...

> @@ -764,7 +762,7 @@ static int msix_capability_init(struct pci_dev *dev,
>      u8 slot = PCI_SLOT(dev->devfn);
>      u8 func = PCI_FUNC(dev->devfn);
>      bool maskall = msix->host_maskall, zap_on_error = false;
> -    unsigned int pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSIX);
> +    unsigned int pos = dev->msix_pos;

... e.g. here already.

> --- a/xen/include/xen/pci.h
> +++ b/xen/include/xen/pci.h
> @@ -113,6 +113,9 @@ struct pci_dev {
>          pci_sbdf_t sbdf;
>      };
>  
> +    unsigned int msi_pos;
> +    unsigned int msix_pos;
> +
>      uint8_t msi_maxvec;
>      uint8_t phantom_stride;

As can be seen from the subsequent members, we're trying to be space
conserving here. Both fields won't require more than 8 bits, so uint8_t
or unsigned char would be the better type to use. Again imo. Preferably
with those adjustments (which could likely be done while committing)
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan
Roger Pau Monné Oct. 28, 2024, 5:53 p.m. UTC | #2
On Mon, Oct 28, 2024 at 05:58:28PM +0100, Jan Beulich wrote:
> On 18.10.2024 22:39, Stewart Hildebrand wrote:
> > Dom0 normally informs Xen of PCI device removal via
> > PHYSDEVOP_pci_device_remove, e.g. in response to SR-IOV disable or
> > hot-unplug. We might find ourselves with stale pdevs if a buggy dom0
> > fails to report removal via PHYSDEVOP_pci_device_remove. In this case,
> > attempts to access the config space of the stale pdevs would be invalid
> > and return all 1s.
> > 
> > Some possible conditions leading to this are:
> > 
> > 1. Dom0 disables SR-IOV without reporting VF removal to Xen.
> > 
> > The Linux SR-IOV subsystem normally reports VF removal when a PF driver
> > disables SR-IOV. In case of a buggy dom0 SR-IOV subsystem, SR-IOV could
> > become disabled with stale dangling VF pdevs in both dom0 Linux and Xen.
> > 
> > 2. Dom0 reporting PF removal without reporting VF removal.
> > 
> > During SR-IOV PF removal (hot-unplug), a buggy PF driver may fail to
> > disable SR-IOV, thus failing to remove the VFs, leaving stale dangling
> > VFs behind in both Xen and Linux. At least Linux warns in this case:
> > 
> > [  100.000000]  0000:01:00.0: driver left SR-IOV enabled after remove
> > 
> > In either case, Xen is left with stale VF pdevs, risking invalid PCI
> > config space accesses.
> > 
> > When Xen is built with CONFIG_DEBUG=y, the following Xen crashes were
> > observed when dom0 attempted to access the config space of a stale VF:
> > 
> > (XEN) Assertion 'pos' failed at arch/x86/msi.c:1274
> > (XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Tainted:   C    ]----
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d040346834>] R pci_msi_conf_write_intercept+0xa2/0x1de
> > (XEN)    [<ffff82d04035d6b4>] F pci_conf_write_intercept+0x68/0x78
> > (XEN)    [<ffff82d0403264e5>] F arch/x86/pv/emul-priv-op.c#pci_cfg_ok+0xa0/0x114
> > (XEN)    [<ffff82d04032660e>] F arch/x86/pv/emul-priv-op.c#guest_io_write+0xb5/0x1c8
> > (XEN)    [<ffff82d0403267bb>] F arch/x86/pv/emul-priv-op.c#write_io+0x9a/0xe0
> > (XEN)    [<ffff82d04037c77a>] F x86_emulate+0x100e5/0x25f1e
> > (XEN)    [<ffff82d0403941a8>] F x86_emulate_wrapper+0x29/0x64
> > (XEN)    [<ffff82d04032802b>] F pv_emulate_privileged_op+0x12e/0x217
> > (XEN)    [<ffff82d040369f12>] F do_general_protection+0xc2/0x1b8
> > (XEN)    [<ffff82d040201aa7>] F x86_64/entry.S#handle_exception_saved+0x2b/0x8c
> > 
> > (XEN) Assertion 'pos' failed at arch/x86/msi.c:1246
> > (XEN) ----[ Xen-4.20-unstable  x86_64  debug=y  Tainted:   C    ]----
> > ...
> > (XEN) Xen call trace:
> > (XEN)    [<ffff82d040346b0a>] R pci_reset_msix_state+0x47/0x50
> > (XEN)    [<ffff82d040287eec>] F pdev_msix_assign+0x19/0x35
> > (XEN)    [<ffff82d040286184>] F drivers/passthrough/pci.c#assign_device+0x181/0x471
> > (XEN)    [<ffff82d040287c36>] F iommu_do_pci_domctl+0x248/0x2ec
> > (XEN)    [<ffff82d040284e1f>] F iommu_do_domctl+0x26/0x44
> > (XEN)    [<ffff82d0402483b8>] F do_domctl+0x8c1/0x1660
> > (XEN)    [<ffff82d04032977e>] F pv_hypercall+0x5ce/0x6af
> > (XEN)    [<ffff82d0402012d3>] F lstar_enter+0x143/0x150
> > 
> > These ASSERTs triggered because the MSI-X capability position can't be
> > found for a stale pdev.
> > 
> > Latch the capability positions of MSI and MSI-X during device init, and
> > replace instances of pci_find_cap_offset(..., PCI_CAP_ID_MSI{,X}) with
> > the stored value. Introduce one additional ASSERT, while the two
> > existing ASSERTs in question continue to work as intended, even with a
> > stale pdev.
> > 
> > Fixes: 484d7c852e4f ("x86/MSI-X: track host and guest mask-all requests separately")
> > Fixes: 575e18d54d19 ("pci: clear {host/guest}_maskall field on assign")
> > Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
> 
> Looks largely okay to me now, just two type selection aspects:
> 
> > --- a/xen/arch/x86/msi.c
> > +++ b/xen/arch/x86/msi.c
> > @@ -278,23 +278,21 @@ void __msi_set_enable(u16 seg, u8 bus, u8 slot, u8 func, int pos, int enable)
> >  
> >  static void msi_set_enable(struct pci_dev *dev, int enable)
> >  {
> > -    int pos;
> > +    int pos = dev->msi_pos;
> 
> This and ...
> 
> >      u16 seg = dev->seg;
> >      u8 bus = dev->bus;
> >      u8 slot = PCI_SLOT(dev->devfn);
> >      u8 func = PCI_FUNC(dev->devfn);
> >  
> > -    pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSI);
> >      if ( pos )
> >          __msi_set_enable(seg, bus, slot, func, pos, enable);
> >  }
> >  
> >  static void msix_set_enable(struct pci_dev *dev, int enable)
> >  {
> > -    int pos;
> > +    int pos = dev->msix_pos;
> 
> ... this want to become unsigned int at this occasion, imo. Like we have ...
> 
> > @@ -764,7 +762,7 @@ static int msix_capability_init(struct pci_dev *dev,
> >      u8 slot = PCI_SLOT(dev->devfn);
> >      u8 func = PCI_FUNC(dev->devfn);
> >      bool maskall = msix->host_maskall, zap_on_error = false;
> > -    unsigned int pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSIX);
> > +    unsigned int pos = dev->msix_pos;
> 
> ... e.g. here already.
> 
> > --- a/xen/include/xen/pci.h
> > +++ b/xen/include/xen/pci.h
> > @@ -113,6 +113,9 @@ struct pci_dev {
> >          pci_sbdf_t sbdf;
> >      };
> >  
> > +    unsigned int msi_pos;
> > +    unsigned int msix_pos;
> > +
> >      uint8_t msi_maxvec;
> >      uint8_t phantom_stride;
> 
> As can be seen from the subsequent members, we're trying to be space
> conserving here. Both fields won't require more than 8 bits, so uint8_t
> or unsigned char would be the better type to use. Again imo. Preferably
> with those adjustments (which could likely be done while committing)
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

uint8_t would seem preferable here, as it's fixed-size width clearly
related to the offset into the PCI configuration space for a device.

It might also be worth noting in the commit message that having the
position cached should be a small perf improvement, by not having to
walk the capability list each time.

Anyway, no strong opinion about the commit message adjustment, so with
the type changed:

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks, Roger.
Jan Beulich Oct. 31, 2024, 11:29 a.m. UTC | #3
On 28.10.2024 18:53, Roger Pau Monné wrote:
> Anyway, no strong opinion about the commit message adjustment, so with
> the type changed:

Btw, while preparing this patch for committing I ended up confused by this:
I can't find any request to adjust the commit message. The only other thing
I had asked for where plain int -> unsigned int adjustments.

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index ff2e3d86878d..5e24df7be0c0 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -278,23 +278,21 @@  void __msi_set_enable(u16 seg, u8 bus, u8 slot, u8 func, int pos, int enable)
 
 static void msi_set_enable(struct pci_dev *dev, int enable)
 {
-    int pos;
+    int pos = dev->msi_pos;
     u16 seg = dev->seg;
     u8 bus = dev->bus;
     u8 slot = PCI_SLOT(dev->devfn);
     u8 func = PCI_FUNC(dev->devfn);
 
-    pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSI);
     if ( pos )
         __msi_set_enable(seg, bus, slot, func, pos, enable);
 }
 
 static void msix_set_enable(struct pci_dev *dev, int enable)
 {
-    int pos;
+    int pos = dev->msix_pos;
     uint16_t control;
 
-    pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSIX);
     if ( pos )
     {
         control = pci_conf_read16(dev->sbdf, msix_control_reg(pos));
@@ -601,7 +599,7 @@  static int msi_capability_init(struct pci_dev *dev,
     uint16_t control;
 
     ASSERT_PDEV_LIST_IS_READ_LOCKED(dev->domain);
-    pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSI);
+    pos = dev->msi_pos;
     if ( !pos )
         return -ENODEV;
     control = pci_conf_read16(dev->sbdf, msi_control_reg(pos));
@@ -764,7 +762,7 @@  static int msix_capability_init(struct pci_dev *dev,
     u8 slot = PCI_SLOT(dev->devfn);
     u8 func = PCI_FUNC(dev->devfn);
     bool maskall = msix->host_maskall, zap_on_error = false;
-    unsigned int pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSIX);
+    unsigned int pos = dev->msix_pos;
 
     if ( !pos )
         return -ENODEV;
@@ -1133,11 +1131,13 @@  static void _pci_cleanup_msix(struct arch_msix *msix)
 static void __pci_disable_msix(struct msi_desc *entry)
 {
     struct pci_dev *dev = entry->dev;
-    unsigned int pos = pci_find_cap_offset(dev->sbdf, PCI_CAP_ID_MSIX);
+    unsigned int pos = dev->msix_pos;
     u16 control = pci_conf_read16(dev->sbdf,
                                   msix_control_reg(entry->msi_attrib.pos));
     bool maskall = dev->msix->host_maskall;
 
+    ASSERT(pos);
+
     if ( unlikely(!(control & PCI_MSIX_FLAGS_ENABLE)) )
     {
         dev->msix->host_maskall = 1;
@@ -1241,7 +1241,7 @@  void pci_cleanup_msi(struct pci_dev *pdev)
 
 int pci_reset_msix_state(struct pci_dev *pdev)
 {
-    unsigned int pos = pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_MSIX);
+    unsigned int pos = pdev->msix_pos;
 
     ASSERT(pos);
     /*
@@ -1269,8 +1269,7 @@  int pci_msi_conf_write_intercept(struct pci_dev *pdev, unsigned int reg,
     if ( pdev->msix )
     {
         entry = find_msi_entry(pdev, -1, PCI_CAP_ID_MSIX);
-        pos = entry ? entry->msi_attrib.pos
-                    : pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_MSIX);
+        pos = entry ? entry->msi_attrib.pos : pdev->msix_pos;
         ASSERT(pos);
 
         if ( reg >= pos && reg < msix_pba_offset_reg(pos) + 4 )
diff --git a/xen/drivers/passthrough/msi.c b/xen/drivers/passthrough/msi.c
index 13d904692ef8..ed2bc7ebe635 100644
--- a/xen/drivers/passthrough/msi.c
+++ b/xen/drivers/passthrough/msi.c
@@ -29,6 +29,7 @@  int pdev_msi_init(struct pci_dev *pdev)
     {
         uint16_t ctrl = pci_conf_read16(pdev->sbdf, msi_control_reg(pos));
 
+        pdev->msi_pos = pos;
         pdev->msi_maxvec = multi_msi_capable(ctrl);
     }
 
@@ -41,6 +42,8 @@  int pdev_msi_init(struct pci_dev *pdev)
         if ( !msix )
             return -ENOMEM;
 
+        pdev->msix_pos = pos;
+
         spin_lock_init(&msix->table_lock);
 
         ctrl = pci_conf_read16(pdev->sbdf, msix_control_reg(pos));
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
index dd6620ec5674..66e5a8a116be 100644
--- a/xen/drivers/vpci/msi.c
+++ b/xen/drivers/vpci/msi.c
@@ -195,7 +195,7 @@  static void cf_check mask_write(
 
 static int cf_check init_msi(struct pci_dev *pdev)
 {
-    unsigned int pos = pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_MSI);
+    unsigned int pos = pdev->msi_pos;
     uint16_t control;
     int ret;
 
diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 5bb4444ce21f..6bd8c55bb48e 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -711,7 +711,7 @@  static int cf_check init_msix(struct pci_dev *pdev)
     struct vpci_msix *msix;
     int rc;
 
-    msix_offset = pci_find_cap_offset(pdev->sbdf, PCI_CAP_ID_MSIX);
+    msix_offset = pdev->msix_pos;
     if ( !msix_offset )
         return 0;
 
diff --git a/xen/include/xen/pci.h b/xen/include/xen/pci.h
index 63e49f0117e9..ef56e80651d6 100644
--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -113,6 +113,9 @@  struct pci_dev {
         pci_sbdf_t sbdf;
     };
 
+    unsigned int msi_pos;
+    unsigned int msix_pos;
+
     uint8_t msi_maxvec;
     uint8_t phantom_stride;