diff mbox

[v2] xen-hvm: stop faking I/O to access PCI config space

Message ID 1526648406-1746-1-git-send-email-paul.durrant@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Paul Durrant May 18, 2018, 1 p.m. UTC
This patch removes the current hackery where IOREQ_TYPE_PCI_CONFIG
requests are handled by faking PIO to 0xcf8 and 0xcfc and replaces it
with direct calls to pci_host_config_read/write_common().
Doing so necessitates mapping BDFs to PCIDevices but maintaining a simple
QLIST in xen_device_realize/unrealize() will suffice.

NOTE: whilst config space accesses are currently limited to
      PCI_CONFIG_SPACE_SIZE, this patch paves the way to increasing the
      limit to PCIE_CONFIG_SPACE_SIZE when Xen gains the ability to
      emulate MCFG table accesses.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
--
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Anthony Perard <anthony.perard@citrix.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: Roger Pau Monne <roger.pau@citrix.com>

v2:
 - Introduce a helper function rw_config_req_item() to handle config
   register offset calculation
 - Handle req->count > 1 more like MMIO rather than PIO
 - Added Roger to cc list (not taking R-b because of significant change)
---
 hw/i386/xen/trace-events |   2 +
 hw/i386/xen/xen-hvm.c    | 120 +++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 102 insertions(+), 20 deletions(-)

Comments

Jan Beulich May 18, 2018, 1:33 p.m. UTC | #1
>>> On 18.05.18 at 15:00, <paul.durrant@citrix.com> wrote:
> @@ -903,6 +926,80 @@ static void cpu_ioreq_move(ioreq_t *req)
>      }
>  }
>  
> +static void rw_config_req_item(XenPciDevice *xendev, ioreq_t *req,

It looks to me as if both parameters could be constified.

> +                               uint32_t i, uint32_t *val)
> +{
> +    int32_t reg = req->addr;
> +    uint32_t offset = req->size * i;
> +
> +    reg += (req->df ? -1 : 1) * offset;
> +    if (reg < 0 || reg > PCI_CONFIG_SPACE_SIZE) {

Having fought a number of issues in this area in the hypervisor a couple
of years back I wonder
- why reg is of signed type,
- whether overflow of the first multiplication really doesn't matter,
- whether wrapping when adding in the offset is not an issue.

I take it that the rather lax upper bound check (should imo really be
reg + size > PCI_CONFIG_SPACE_SIZE [implying reg + size doesn't
itself wrap], or at least reg >= PCI_CONFIG_SPACE_SIZE) is not a
problem because ...

> +        if (req->dir == IOREQ_READ) {
> +            *val = ~0u;
> +        }
> +        return;
> +    }
> +
> +    if (req->dir == IOREQ_READ) {
> +        *val = pci_host_config_read_common(xendev->pci_dev, reg,
> +                                           PCI_CONFIG_SPACE_SIZE,
> +                                           req->size);
> +        trace_cpu_ioreq_config_read(req, xendev->sbdf, reg,
> +                                    req->size, *val);
> +    } else {
> +        trace_cpu_ioreq_config_write(req, xendev->sbdf, reg, req->size,
> +                                     *val);
> +        pci_host_config_write_common(xendev->pci_dev, reg,
> +                                     PCI_CONFIG_SPACE_SIZE, *val,
> +                                     req->size);
> +    }

... these called functions do full checking anyway?

> +static void cpu_ioreq_config(XenIOState *state, ioreq_t *req)
> +{
> +    uint32_t sbdf = req->addr >> 32;
> +    XenPciDevice *xendev;
> +
> +    if (req->size > sizeof(uint32_t)) {
> +        hw_error("PCI config access: bad size (%u)", req->size);

What about size 0 or 3?

> +    }
> +
> +    QLIST_FOREACH(xendev, &state->dev_list, entry) {
> +        unsigned int i;
> +        uint32_t tmp;
> +
> +        if (xendev->sbdf != sbdf) {
> +            continue;
> +        }
> +
> +        if (!req->data_is_ptr) {
> +            if (req->dir == IOREQ_READ) {
> +                for (i = 0; i < req->count; i++) {
> +                    rw_config_req_item(xendev, req, i, &tmp);
> +                    req->data = tmp;
> +                }
> +            } else if (req->dir == IOREQ_WRITE) {
> +                for (i = 0; i < req->count; i++) {
> +                    tmp = req->data;
> +                    rw_config_req_item(xendev, req, i, &tmp);
> +                }
> +            }

Wouldn't it be more sensible to fail req->count != 1 requests here?

Jan
Paul Durrant May 18, 2018, 1:51 p.m. UTC | #2
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 18 May 2018 14:34
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Anthony Perard <anthony.perard@citrix.com>; Roger Pau Monne
> <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel <xen-devel@lists.xenproject.org>; qemu-devel@nongnu.org;
> ehabkost@redhat.com; marcel@redhat.com; mst@redhat.com; Paolo
> Bonzini <pbonzini@redhat.com>; Richard Henderson <rth@twiddle.net>
> Subject: Re: [Xen-devel] [PATCH v2] xen-hvm: stop faking I/O to access PCI
> config space
> 
> >>> On 18.05.18 at 15:00, <paul.durrant@citrix.com> wrote:
> > @@ -903,6 +926,80 @@ static void cpu_ioreq_move(ioreq_t *req)
> >      }
> >  }
> >
> > +static void rw_config_req_item(XenPciDevice *xendev, ioreq_t *req,
> 
> It looks to me as if both parameters could be constified.
> 

They could for this function, yes.

> > +                               uint32_t i, uint32_t *val)
> > +{
> > +    int32_t reg = req->addr;
> > +    uint32_t offset = req->size * i;
> > +
> > +    reg += (req->df ? -1 : 1) * offset;
> > +    if (reg < 0 || reg > PCI_CONFIG_SPACE_SIZE) {
> 
> Having fought a number of issues in this area in the hypervisor a couple
> of years back I wonder
> - why reg is of signed type,

I did that so I could do a < 0 check.

> - whether overflow of the first multiplication really doesn't matter,

It would be better to check it.

> - whether wrapping when adding in the offset is not an issue.
> 

I'll do limits check on offset then... should be able to make reg unsigned then I guess.

> I take it that the rather lax upper bound check (should imo really be
> reg + size > PCI_CONFIG_SPACE_SIZE [implying reg + size doesn't
> itself wrap], or at least reg >= PCI_CONFIG_SPACE_SIZE) is not a
> problem because ...
> 
> > +        if (req->dir == IOREQ_READ) {
> > +            *val = ~0u;
> > +        }
> > +        return;
> > +    }
> > +
> > +    if (req->dir == IOREQ_READ) {
> > +        *val = pci_host_config_read_common(xendev->pci_dev, reg,
> > +                                           PCI_CONFIG_SPACE_SIZE,
> > +                                           req->size);
> > +        trace_cpu_ioreq_config_read(req, xendev->sbdf, reg,
> > +                                    req->size, *val);
> > +    } else {
> > +        trace_cpu_ioreq_config_write(req, xendev->sbdf, reg, req->size,
> > +                                     *val);
> > +        pci_host_config_write_common(xendev->pci_dev, reg,
> > +                                     PCI_CONFIG_SPACE_SIZE, *val,
> > +                                     req->size);
> > +    }
> 
> ... these called functions do full checking anyway?

Yes, I'm deferring further checking to these common functions. I'm only intending to avoid passing junk into them here.

> 
> > +static void cpu_ioreq_config(XenIOState *state, ioreq_t *req)
> > +{
> > +    uint32_t sbdf = req->addr >> 32;
> > +    XenPciDevice *xendev;
> > +
> > +    if (req->size > sizeof(uint32_t)) {
> > +        hw_error("PCI config access: bad size (%u)", req->size);
> 
> What about size 0 or 3?
> 

Yes, I can reject those here also.

> > +    }
> > +
> > +    QLIST_FOREACH(xendev, &state->dev_list, entry) {
> > +        unsigned int i;
> > +        uint32_t tmp;
> > +
> > +        if (xendev->sbdf != sbdf) {
> > +            continue;
> > +        }
> > +
> > +        if (!req->data_is_ptr) {
> > +            if (req->dir == IOREQ_READ) {
> > +                for (i = 0; i < req->count; i++) {
> > +                    rw_config_req_item(xendev, req, i, &tmp);
> > +                    req->data = tmp;
> > +                }
> > +            } else if (req->dir == IOREQ_WRITE) {
> > +                for (i = 0; i < req->count; i++) {
> > +                    tmp = req->data;
> > +                    rw_config_req_item(xendev, req, i, &tmp);
> > +                }
> > +            }
> 
> Wouldn't it be more sensible to fail req->count != 1 requests here?
> 

I'm wondering whether we'd want to handle count > 1 once we allow MMCONFIG accesses though. I guess it would be easier just to defer that.

  Paul

> Jan
>
Jan Beulich May 18, 2018, 2:15 p.m. UTC | #3
>>> On 18.05.18 at 15:51, <Paul.Durrant@citrix.com> wrote:
>> Sent: 18 May 2018 14:34
>> To: Paul Durrant <Paul.Durrant@citrix.com>
>> >>> On 18.05.18 at 15:00, <paul.durrant@citrix.com> wrote:
>> > +    QLIST_FOREACH(xendev, &state->dev_list, entry) {
>> > +        unsigned int i;
>> > +        uint32_t tmp;
>> > +
>> > +        if (xendev->sbdf != sbdf) {
>> > +            continue;
>> > +        }
>> > +
>> > +        if (!req->data_is_ptr) {
>> > +            if (req->dir == IOREQ_READ) {
>> > +                for (i = 0; i < req->count; i++) {
>> > +                    rw_config_req_item(xendev, req, i, &tmp);
>> > +                    req->data = tmp;
>> > +                }
>> > +            } else if (req->dir == IOREQ_WRITE) {
>> > +                for (i = 0; i < req->count; i++) {
>> > +                    tmp = req->data;
>> > +                    rw_config_req_item(xendev, req, i, &tmp);
>> > +                }
>> > +            }
>> 
>> Wouldn't it be more sensible to fail req->count != 1 requests here?
>> 
> 
> I'm wondering whether we'd want to handle count > 1 once we allow MMCONFIG 
> accesses though. I guess it would be easier just to defer that.

For the data_is_ptr case - sure. But here? Or wait - are you thinking about
REP STOS (and the relatively useless REP LODS)?

Jan
Paul Durrant May 18, 2018, 2:22 p.m. UTC | #4
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 18 May 2018 15:16
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Anthony Perard <anthony.perard@citrix.com>; Roger Pau Monne
> <roger.pau@citrix.com>; Stefano Stabellini <sstabellini@kernel.org>; xen-
> devel <xen-devel@lists.xenproject.org>; qemu-devel@nongnu.org;
> ehabkost@redhat.com; marcel@redhat.com; mst@redhat.com; Paolo
> Bonzini <pbonzini@redhat.com>; Richard Henderson <rth@twiddle.net>
> Subject: RE: [Xen-devel] [PATCH v2] xen-hvm: stop faking I/O to access PCI
> config space
> 
> >>> On 18.05.18 at 15:51, <Paul.Durrant@citrix.com> wrote:
> >> Sent: 18 May 2018 14:34
> >> To: Paul Durrant <Paul.Durrant@citrix.com>
> >> >>> On 18.05.18 at 15:00, <paul.durrant@citrix.com> wrote:
> >> > +    QLIST_FOREACH(xendev, &state->dev_list, entry) {
> >> > +        unsigned int i;
> >> > +        uint32_t tmp;
> >> > +
> >> > +        if (xendev->sbdf != sbdf) {
> >> > +            continue;
> >> > +        }
> >> > +
> >> > +        if (!req->data_is_ptr) {
> >> > +            if (req->dir == IOREQ_READ) {
> >> > +                for (i = 0; i < req->count; i++) {
> >> > +                    rw_config_req_item(xendev, req, i, &tmp);
> >> > +                    req->data = tmp;
> >> > +                }
> >> > +            } else if (req->dir == IOREQ_WRITE) {
> >> > +                for (i = 0; i < req->count; i++) {
> >> > +                    tmp = req->data;
> >> > +                    rw_config_req_item(xendev, req, i, &tmp);
> >> > +                }
> >> > +            }
> >>
> >> Wouldn't it be more sensible to fail req->count != 1 requests here?
> >>
> >
> > I'm wondering whether we'd want to handle count > 1 once we allow
> MMCONFIG
> > accesses though. I guess it would be easier just to defer that.
> 
> For the data_is_ptr case - sure. But here? Or wait - are you thinking about
> REP STOS (and the relatively useless REP LODS)?
> 

Yes. We'd need to cope with a rep stos if we had memory mapped access, but we don't need to worry about it until then I think.

In the meantime I doubt any well behaved OS is going to do rep ins or rep outs to cfc so just aborting on count > 1 is probably fine.

  Paul

> Jan
>
diff mbox

Patch

diff --git a/hw/i386/xen/trace-events b/hw/i386/xen/trace-events
index 8dab7bc..f576f1b 100644
--- a/hw/i386/xen/trace-events
+++ b/hw/i386/xen/trace-events
@@ -15,6 +15,8 @@  cpu_ioreq_pio(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, uint64
 cpu_ioreq_pio_read_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) "I/O=%p pio read reg data=0x%"PRIx64" port=0x%"PRIx64" size=%d"
 cpu_ioreq_pio_write_reg(void *req, uint64_t data, uint64_t addr, uint32_t size) "I/O=%p pio write reg data=0x%"PRIx64" port=0x%"PRIx64" size=%d"
 cpu_ioreq_move(void *req, uint32_t dir, uint32_t df, uint32_t data_is_ptr, uint64_t addr, uint64_t data, uint32_t count, uint32_t size) "I/O=%p copy dir=%d df=%d ptr=%d port=0x%"PRIx64" data=0x%"PRIx64" count=%d size=%d"
+cpu_ioreq_config_read(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
+cpu_ioreq_config_write(void *req, uint32_t sbdf, uint32_t reg, uint32_t size, uint32_t data) "I/O=%p sbdf=0x%x reg=%u size=%u data=0x%x"
 
 # xen-mapcache.c
 xen_map_cache(uint64_t phys_addr) "want 0x%"PRIx64
diff --git a/hw/i386/xen/xen-hvm.c b/hw/i386/xen/xen-hvm.c
index caa563b..d79b1d6 100644
--- a/hw/i386/xen/xen-hvm.c
+++ b/hw/i386/xen/xen-hvm.c
@@ -12,6 +12,7 @@ 
 
 #include "cpu.h"
 #include "hw/pci/pci.h"
+#include "hw/pci/pci_host.h"
 #include "hw/i386/pc.h"
 #include "hw/i386/apic-msidef.h"
 #include "hw/xen/xen_common.h"
@@ -86,6 +87,12 @@  typedef struct XenPhysmap {
     QLIST_ENTRY(XenPhysmap) list;
 } XenPhysmap;
 
+typedef struct XenPciDevice {
+    PCIDevice *pci_dev;
+    uint32_t sbdf;
+    QLIST_ENTRY(XenPciDevice) entry;
+} XenPciDevice;
+
 typedef struct XenIOState {
     ioservid_t ioservid;
     shared_iopage_t *shared_page;
@@ -105,6 +112,7 @@  typedef struct XenIOState {
     struct xs_handle *xenstore;
     MemoryListener memory_listener;
     MemoryListener io_listener;
+    QLIST_HEAD(, XenPciDevice) dev_list;
     DeviceListener device_listener;
     QLIST_HEAD(, XenPhysmap) physmap;
     hwaddr free_phys_offset;
@@ -569,6 +577,12 @@  static void xen_device_realize(DeviceListener *listener,
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
         PCIDevice *pci_dev = PCI_DEVICE(dev);
+        XenPciDevice *xendev = g_new(XenPciDevice, 1);
+
+        xendev->pci_dev = pci_dev;
+        xendev->sbdf = PCI_BUILD_BDF(pci_dev_bus_num(pci_dev),
+                                     pci_dev->devfn);
+        QLIST_INSERT_HEAD(&state->dev_list, xendev, entry);
 
         xen_map_pcidev(xen_domid, state->ioservid, pci_dev);
     }
@@ -581,8 +595,17 @@  static void xen_device_unrealize(DeviceListener *listener,
 
     if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
         PCIDevice *pci_dev = PCI_DEVICE(dev);
+        XenPciDevice *xendev, *next;
 
         xen_unmap_pcidev(xen_domid, state->ioservid, pci_dev);
+
+        QLIST_FOREACH_SAFE(xendev, &state->dev_list, entry, next) {
+            if (xendev->pci_dev == pci_dev) {
+                QLIST_REMOVE(xendev, entry);
+                g_free(xendev);
+                break;
+            }
+        }
     }
 }
 
@@ -903,6 +926,80 @@  static void cpu_ioreq_move(ioreq_t *req)
     }
 }
 
+static void rw_config_req_item(XenPciDevice *xendev, ioreq_t *req,
+                               uint32_t i, uint32_t *val)
+{
+    int32_t reg = req->addr;
+    uint32_t offset = req->size * i;
+
+    reg += (req->df ? -1 : 1) * offset;
+    if (reg < 0 || reg > PCI_CONFIG_SPACE_SIZE) {
+        if (req->dir == IOREQ_READ) {
+            *val = ~0u;
+        }
+        return;
+    }
+
+    if (req->dir == IOREQ_READ) {
+        *val = pci_host_config_read_common(xendev->pci_dev, reg,
+                                           PCI_CONFIG_SPACE_SIZE,
+                                           req->size);
+        trace_cpu_ioreq_config_read(req, xendev->sbdf, reg,
+                                    req->size, *val);
+    } else {
+        trace_cpu_ioreq_config_write(req, xendev->sbdf, reg, req->size,
+                                     *val);
+        pci_host_config_write_common(xendev->pci_dev, reg,
+                                     PCI_CONFIG_SPACE_SIZE, *val,
+                                     req->size);
+    }
+}
+
+static void cpu_ioreq_config(XenIOState *state, ioreq_t *req)
+{
+    uint32_t sbdf = req->addr >> 32;
+    XenPciDevice *xendev;
+
+    if (req->size > sizeof(uint32_t)) {
+        hw_error("PCI config access: bad size (%u)", req->size);
+    }
+
+    QLIST_FOREACH(xendev, &state->dev_list, entry) {
+        unsigned int i;
+        uint32_t tmp;
+
+        if (xendev->sbdf != sbdf) {
+            continue;
+        }
+
+        if (!req->data_is_ptr) {
+            if (req->dir == IOREQ_READ) {
+                for (i = 0; i < req->count; i++) {
+                    rw_config_req_item(xendev, req, i, &tmp);
+                    req->data = tmp;
+                }
+            } else if (req->dir == IOREQ_WRITE) {
+                for (i = 0; i < req->count; i++) {
+                    tmp = req->data;
+                    rw_config_req_item(xendev, req, i, &tmp);
+                }
+            }
+        } else {
+            if (req->dir == IOREQ_READ) {
+                for (i = 0; i < req->count; i++) {
+                    rw_config_req_item(xendev, req, i, &tmp);
+                    write_phys_req_item(req->data, req, i, &tmp);
+                }
+            } else if (req->dir == IOREQ_WRITE) {
+                for (i = 0; i < req->count; i++) {
+                    read_phys_req_item(req->data, req, i, &tmp);
+                    rw_config_req_item(xendev, req, i, &tmp);
+                }
+            }
+        }
+    }
+}
+
 static void regs_to_cpu(vmware_regs_t *vmport_regs, ioreq_t *req)
 {
     X86CPU *cpu;
@@ -975,27 +1072,9 @@  static void handle_ioreq(XenIOState *state, ioreq_t *req)
         case IOREQ_TYPE_INVALIDATE:
             xen_invalidate_map_cache();
             break;
-        case IOREQ_TYPE_PCI_CONFIG: {
-            uint32_t sbdf = req->addr >> 32;
-            uint32_t val;
-
-            /* Fake a write to port 0xCF8 so that
-             * the config space access will target the
-             * correct device model.
-             */
-            val = (1u << 31) |
-                  ((req->addr & 0x0f00) << 16) |
-                  ((sbdf & 0xffff) << 8) |
-                  (req->addr & 0xfc);
-            do_outp(0xcf8, 4, val);
-
-            /* Now issue the config space access via
-             * port 0xCFC
-             */
-            req->addr = 0xcfc | (req->addr & 0x03);
-            cpu_ioreq_pio(req);
+        case IOREQ_TYPE_PCI_CONFIG:
+            cpu_ioreq_config(state, req);
             break;
-        }
         default:
             hw_error("Invalid ioreq type 0x%x\n", req->type);
     }
@@ -1366,6 +1445,7 @@  void xen_hvm_init(PCMachineState *pcms, MemoryRegion **ram_memory)
     memory_listener_register(&state->io_listener, &address_space_io);
 
     state->device_listener = xen_device_listener;
+    QLIST_INIT(&state->dev_list);
     device_listener_register(&state->device_listener);
 
     /* Initialize backend core & drivers */