diff mbox

[v5,09/11] vpci/msi: add MSI handlers

Message ID 20170814142850.39133-10-roger.pau@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Roger Pau Monné Aug. 14, 2017, 2:28 p.m. UTC
Add handlers for the MSI control, address, data and mask fields in
order to detect accesses to them and setup the interrupts as requested
by the guest.

Note that the pending register is not trapped, and the guest can
freely read/write to it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v4:
 - Fix commit message.
 - Change the ASSERTs in vpci_msi_arch_mask into ifs.
 - Introduce INVALID_PIRQ.
 - Destroy the partially created bindings in case of failure in
   vpci_msi_arch_enable.
 - Just take the pcidevs lock once in vpci_msi_arch_disable.
 - Print an error message in case of failure of pt_irq_destroy_bind.
 - Make vpci_msi_arch_init return void.
 - Constify the arch parameter of vpci_msi_arch_print.
 - Use fixed instead of cpu for msi redirection.
 - Separate the header includes in vpci/msi.c between xen and asm.
 - Store the number of configured vectors even if MSI is not enabled
   and always return it in vpci_msi_control_read.
 - Fix/add comments in vpci_msi_control_write to clarify intended
   behavior.
 - Simplify usage of masks in vpci_msi_address_{upper_}write.
 - Add comment to vpci_msi_mask_{read/write}.
 - Don't use MASK_EXTR in vpci_msi_mask_write.
 - s/msi_offset/pos/ in vpci_init_msi.
 - Move control variable setup closer to it's usage.
 - Use d%d in vpci_dump_msi.
 - Fix printing of bitfield mask in vpci_dump_msi.
 - Fix definition of MSI_ADDR_REDIRECTION_MASK.
 - Shuffle the layout of vpci_msi to minimize gaps.
 - Remove the error label in vpci_init_msi.

Changes since v3:
 - Propagate changes from previous versions: drop xen_ prefix, drop
   return value from handlers, use the new vpci_val fields.
 - Use MASK_EXTR.
 - Remove the usage of GENMASK.
 - Add GFLAGS_SHIFT_DEST_ID and use it in msi_flags.
 - Add "arch" to the MSI arch specific functions.
 - Move the dumping of vPCI MSI information to dump_msi (key 'M').
 - Remove the guest_vectors field.
 - Allow the guest to change the number of active vectors without
   having to disable and enable MSI.
 - Check the number of active vectors when parsing the disable
   mask.
 - Remove the debug messages from vpci_init_msi.
 - Move the arch-specific part of the dump handler to x86/hvm/vmsi.c.
 - Use trylock in the dump handler to get the vpci lock.

Changes since v2:
 - Add an arch-specific abstraction layer. Note that this is only implemented
   for x86 currently.
 - Add a wrapper to detect MSI enabling for vPCI.

NB: I've only been able to test this with devices using a single MSI interrupt
and no mask register. I will try to find hardware that supports the mask
register and more than one vector, but I cannot make any promises.

If there are doubts about the untested parts we could always force Xen to
report no per-vector masking support and only 1 available vector, but I would
rather avoid doing it.
---
 xen/arch/x86/hvm/vmsi.c      | 156 ++++++++++++++++++
 xen/arch/x86/msi.c           |   3 +
 xen/drivers/vpci/Makefile    |   2 +-
 xen/drivers/vpci/msi.c       | 368 +++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/io.h |  18 +++
 xen/include/asm-x86/msi.h    |   1 +
 xen/include/xen/hvm/irq.h    |   2 +
 xen/include/xen/irq.h        |   1 +
 xen/include/xen/vpci.h       |  27 ++++
 9 files changed, 577 insertions(+), 1 deletion(-)
 create mode 100644 xen/drivers/vpci/msi.c

Comments

Paul Durrant Aug. 22, 2017, 12:20 p.m. UTC | #1
> -----Original Message-----

> From: Roger Pau Monne [mailto:roger.pau@citrix.com]

> Sent: 14 August 2017 15:29

> To: xen-devel@lists.xenproject.org

> Cc: boris.ostrovsky@oracle.com; konrad.wilk@oracle.com; Roger Pau Monne

> <roger.pau@citrix.com>; Jan Beulich <jbeulich@suse.com>; Andrew Cooper

> <Andrew.Cooper3@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>

> Subject: [PATCH v5 09/11] vpci/msi: add MSI handlers

> 

> Add handlers for the MSI control, address, data and mask fields in

> order to detect accesses to them and setup the interrupts as requested

> by the guest.

> 

> Note that the pending register is not trapped, and the guest can

> freely read/write to it.

> 

> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>


Reviewed-by: Paul Durrant <paul.durrant@citrix.com>


> ---

> Cc: Jan Beulich <jbeulich@suse.com>

> Cc: Andrew Cooper <andrew.cooper3@citrix.com>

> Cc: Paul Durrant <paul.durrant@citrix.com>

> ---

> Changes since v4:

>  - Fix commit message.

>  - Change the ASSERTs in vpci_msi_arch_mask into ifs.

>  - Introduce INVALID_PIRQ.

>  - Destroy the partially created bindings in case of failure in

>    vpci_msi_arch_enable.

>  - Just take the pcidevs lock once in vpci_msi_arch_disable.

>  - Print an error message in case of failure of pt_irq_destroy_bind.

>  - Make vpci_msi_arch_init return void.

>  - Constify the arch parameter of vpci_msi_arch_print.

>  - Use fixed instead of cpu for msi redirection.

>  - Separate the header includes in vpci/msi.c between xen and asm.

>  - Store the number of configured vectors even if MSI is not enabled

>    and always return it in vpci_msi_control_read.

>  - Fix/add comments in vpci_msi_control_write to clarify intended

>    behavior.

>  - Simplify usage of masks in vpci_msi_address_{upper_}write.

>  - Add comment to vpci_msi_mask_{read/write}.

>  - Don't use MASK_EXTR in vpci_msi_mask_write.

>  - s/msi_offset/pos/ in vpci_init_msi.

>  - Move control variable setup closer to it's usage.

>  - Use d%d in vpci_dump_msi.

>  - Fix printing of bitfield mask in vpci_dump_msi.

>  - Fix definition of MSI_ADDR_REDIRECTION_MASK.

>  - Shuffle the layout of vpci_msi to minimize gaps.

>  - Remove the error label in vpci_init_msi.

> 

> Changes since v3:

>  - Propagate changes from previous versions: drop xen_ prefix, drop

>    return value from handlers, use the new vpci_val fields.

>  - Use MASK_EXTR.

>  - Remove the usage of GENMASK.

>  - Add GFLAGS_SHIFT_DEST_ID and use it in msi_flags.

>  - Add "arch" to the MSI arch specific functions.

>  - Move the dumping of vPCI MSI information to dump_msi (key 'M').

>  - Remove the guest_vectors field.

>  - Allow the guest to change the number of active vectors without

>    having to disable and enable MSI.

>  - Check the number of active vectors when parsing the disable

>    mask.

>  - Remove the debug messages from vpci_init_msi.

>  - Move the arch-specific part of the dump handler to x86/hvm/vmsi.c.

>  - Use trylock in the dump handler to get the vpci lock.

> 

> Changes since v2:

>  - Add an arch-specific abstraction layer. Note that this is only implemented

>    for x86 currently.

>  - Add a wrapper to detect MSI enabling for vPCI.

> 

> NB: I've only been able to test this with devices using a single MSI interrupt

> and no mask register. I will try to find hardware that supports the mask

> register and more than one vector, but I cannot make any promises.

> 

> If there are doubts about the untested parts we could always force Xen to

> report no per-vector masking support and only 1 available vector, but I would

> rather avoid doing it.

> ---

>  xen/arch/x86/hvm/vmsi.c      | 156 ++++++++++++++++++

>  xen/arch/x86/msi.c           |   3 +

>  xen/drivers/vpci/Makefile    |   2 +-

>  xen/drivers/vpci/msi.c       | 368

> +++++++++++++++++++++++++++++++++++++++++++

>  xen/include/asm-x86/hvm/io.h |  18 +++

>  xen/include/asm-x86/msi.h    |   1 +

>  xen/include/xen/hvm/irq.h    |   2 +

>  xen/include/xen/irq.h        |   1 +

>  xen/include/xen/vpci.h       |  27 ++++

>  9 files changed, 577 insertions(+), 1 deletion(-)

>  create mode 100644 xen/drivers/vpci/msi.c

> 

> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c

> index a36692c313..aea088e290 100644

> --- a/xen/arch/x86/hvm/vmsi.c

> +++ b/xen/arch/x86/hvm/vmsi.c

> @@ -622,3 +622,159 @@ void msix_write_completion(struct vcpu *v)

>      if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )

>          gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");

>  }

> +

> +static unsigned int msi_vector(uint16_t data)

> +{

> +    return MASK_EXTR(data, MSI_DATA_VECTOR_MASK);

> +}

> +

> +static unsigned int msi_flags(uint16_t data, uint64_t addr)

> +{

> +    unsigned int rh, dm, dest_id, deliv_mode, trig_mode;

> +

> +    rh = MASK_EXTR(addr, MSI_ADDR_REDIRECTION_MASK);

> +    dm = MASK_EXTR(addr, MSI_ADDR_DESTMODE_MASK);

> +    dest_id = MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK);

> +    deliv_mode = MASK_EXTR(data, MSI_DATA_DELIVERY_MODE_MASK);

> +    trig_mode = MASK_EXTR(data, MSI_DATA_TRIGGER_MASK);

> +

> +    return (dest_id << GFLAGS_SHIFT_DEST_ID) | (rh << GFLAGS_SHIFT_RH)

> |

> +           (dm << GFLAGS_SHIFT_DM) | (deliv_mode <<

> GFLAGS_SHIFT_DELIV_MODE) |

> +           (trig_mode << GFLAGS_SHIFT_TRG_MODE);

> +}

> +

> +void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,

> +                        unsigned int entry, bool mask)

> +{

> +    struct domain *d = pdev->domain;

> +    const struct pirq *pinfo;

> +    struct irq_desc *desc;

> +    unsigned long flags;

> +    int irq;

> +

> +    ASSERT(arch->pirq >= 0);

> +    pinfo = pirq_info(d, arch->pirq + entry);

> +    if ( !pinfo )

> +        return;

> +

> +    irq = pinfo->arch.irq;

> +    if ( irq >= nr_irqs || irq < 0)

> +        return;

> +

> +    desc = irq_to_desc(irq);

> +    if ( !desc )

> +        return;

> +

> +    spin_lock_irqsave(&desc->lock, flags);

> +    guest_mask_msi_irq(desc, mask);

> +    spin_unlock_irqrestore(&desc->lock, flags);

> +}

> +

> +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,

> +                         uint64_t address, uint32_t data, unsigned int vectors)

> +{

> +    struct msi_info msi_info = {

> +        .seg = pdev->seg,

> +        .bus = pdev->bus,

> +        .devfn = pdev->devfn,

> +        .entry_nr = vectors,

> +    };

> +    unsigned int i;

> +    int rc;

> +

> +    ASSERT(arch->pirq == INVALID_PIRQ);

> +

> +    /* Get a PIRQ. */

> +    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,

> +                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);

> +    if ( rc )

> +    {

> +        gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ:

> %d\n",

> +                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),

> +                 PCI_FUNC(pdev->devfn), rc);

> +        return rc;

> +    }

> +

> +    for ( i = 0; i < vectors; i++ )

> +    {

> +        xen_domctl_bind_pt_irq_t bind = {

> +            .machine_irq = arch->pirq + i,

> +            .irq_type = PT_IRQ_TYPE_MSI,

> +            .u.msi.gvec = msi_vector(data) + i,

> +            .u.msi.gflags = msi_flags(data, address),

> +        };

> +

> +        pcidevs_lock();

> +        rc = pt_irq_create_bind(pdev->domain, &bind);

> +        if ( rc )

> +        {

> +            gdprintk(XENLOG_ERR,

> +                     "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",

> +                     pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),

> +                     PCI_FUNC(pdev->devfn), arch->pirq + i, rc);

> +            while ( bind.machine_irq-- )

> +                pt_irq_destroy_bind(pdev->domain, &bind);

> +            spin_lock(&pdev->domain->event_lock);

> +            unmap_domain_pirq(pdev->domain, arch->pirq);

> +            spin_unlock(&pdev->domain->event_lock);

> +            pcidevs_unlock();

> +            arch->pirq = -1;

> +            return rc;

> +        }

> +        pcidevs_unlock();

> +    }

> +

> +    return 0;

> +}

> +

> +int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev

> *pdev,

> +                          unsigned int vectors)

> +{

> +    unsigned int i;

> +

> +    ASSERT(arch->pirq != INVALID_PIRQ);

> +

> +    pcidevs_lock();

> +    for ( i = 0; i < vectors; i++ )

> +    {

> +        xen_domctl_bind_pt_irq_t bind = {

> +            .machine_irq = arch->pirq + i,

> +            .irq_type = PT_IRQ_TYPE_MSI,

> +        };

> +        int rc;

> +

> +        rc = pt_irq_destroy_bind(pdev->domain, &bind);

> +        gdprintk(XENLOG_ERR,

> +                 "%04x:%02x:%02x.%u: failed to unbind PIRQ %u: %d\n",

> +                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),

> +                 PCI_FUNC(pdev->devfn), arch->pirq + i, rc);

> +    }

> +

> +    spin_lock(&pdev->domain->event_lock);

> +    unmap_domain_pirq(pdev->domain, arch->pirq);

> +    spin_unlock(&pdev->domain->event_lock);

> +    pcidevs_unlock();

> +

> +    arch->pirq = INVALID_PIRQ;

> +

> +    return 0;

> +}

> +

> +void vpci_msi_arch_init(struct vpci_arch_msi *arch)

> +{

> +    arch->pirq = INVALID_PIRQ;

> +}

> +

> +void vpci_msi_arch_print(const struct vpci_arch_msi *arch, uint16_t data,

> +                         uint64_t addr)

> +{

> +    printk("vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu pirq: %d\n",

> +           MASK_EXTR(data, MSI_DATA_VECTOR_MASK),

> +           data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",

> +           data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",

> +           data & MSI_DATA_LEVEL_ASSERT ? "" : "de",

> +           addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",

> +           addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "fixed",

> +           MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK),

> +           arch->pirq);

> +}

> diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c

> index 77998f4fb3..63769153f1 100644

> --- a/xen/arch/x86/msi.c

> +++ b/xen/arch/x86/msi.c

> @@ -30,6 +30,7 @@

>  #include <public/physdev.h>

>  #include <xen/iommu.h>

>  #include <xsm/xsm.h>

> +#include <xen/vpci.h>

> 

>  static s8 __read_mostly use_msi = -1;

>  boolean_param("msi", use_msi);

> @@ -1536,6 +1537,8 @@ static void dump_msi(unsigned char key)

>                 attr.guest_masked ? 'G' : ' ',

>                 mask);

>      }

> +

> +    vpci_dump_msi();

>  }

> 

>  static int __init msi_setup_keyhandler(void)

> diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile

> index 241467212f..62cec9e82b 100644

> --- a/xen/drivers/vpci/Makefile

> +++ b/xen/drivers/vpci/Makefile

> @@ -1 +1 @@

> -obj-y += vpci.o header.o

> +obj-y += vpci.o header.o msi.o

> diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c

> new file mode 100644

> index 0000000000..1e36b9779a

> --- /dev/null

> +++ b/xen/drivers/vpci/msi.c

> @@ -0,0 +1,368 @@

> +/*

> + * Handlers for accesses to the MSI capability structure.

> + *

> + * Copyright (C) 2017 Citrix Systems R&D

> + *

> + * This program is free software; you can redistribute it and/or

> + * modify it under the terms and conditions of the GNU General Public

> + * License, version 2, as published by the Free Software Foundation.

> + *

> + * This program is distributed in the hope that it will be useful,

> + * but WITHOUT ANY WARRANTY; without even the implied warranty of

> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> GNU

> + * General Public License for more details.

> + *

> + * You should have received a copy of the GNU General Public

> + * License along with this program; If not, see

> <http://www.gnu.org/licenses/>.

> + */

> +

> +#include <xen/sched.h>

> +#include <xen/vpci.h>

> +

> +#include <asm/msi.h>

> +

> +/* Handlers for the MSI control field (PCI_MSI_FLAGS). */

> +static uint32_t vpci_msi_control_read(struct pci_dev *pdev, unsigned int

> reg,

> +                                      const void *data)

> +{

> +    const struct vpci_msi *msi = data;

> +    uint16_t val;

> +

> +    /* Set the number of supported/configured messages. */

> +    val = MASK_INSR(fls(msi->max_vectors) - 1, PCI_MSI_FLAGS_QMASK);

> +    val |= MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE);

> +

> +    val |= msi->enabled ? PCI_MSI_FLAGS_ENABLE : 0;

> +    val |= msi->masking ? PCI_MSI_FLAGS_MASKBIT : 0;

> +    val |= msi->address64 ? PCI_MSI_FLAGS_64BIT : 0;

> +

> +    return val;

> +}

> +

> +static void vpci_msi_enable(struct pci_dev *pdev, struct vpci_msi *msi,

> +                            unsigned int vectors)

> +{

> +    int ret;

> +

> +    ASSERT(!msi->enabled);

> +    ret = vpci_msi_arch_enable(&msi->arch, pdev, msi->address, msi->data,

> +                               vectors);

> +    if ( ret )

> +        return;

> +

> +    /* Apply the mask bits. */

> +    if ( msi->masking )

> +    {

> +        unsigned int i;

> +        uint32_t mask = msi->mask;

> +

> +        for ( i = ffs(mask) - 1; mask && i < vectors; i = ffs(mask) - 1 )

> +        {

> +            vpci_msi_arch_mask(&msi->arch, pdev, i, true);

> +            __clear_bit(i, &mask);

> +        }

> +    }

> +

> +    __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),

> +                     PCI_FUNC(pdev->devfn), msi->pos, 1);

> +

> +    msi->enabled = true;

> +}

> +

> +static int vpci_msi_disable(struct pci_dev *pdev, struct vpci_msi *msi)

> +{

> +    int ret;

> +

> +    ASSERT(msi->enabled);

> +    __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),

> +                     PCI_FUNC(pdev->devfn), msi->pos, 0);

> +

> +    ret = vpci_msi_arch_disable(&msi->arch, pdev, msi->vectors);

> +    if ( ret )

> +        return ret;

> +

> +    msi->enabled = false;

> +

> +    return 0;

> +}

> +

> +static void vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,

> +                                   uint32_t val, void *data)

> +{

> +    struct vpci_msi *msi = data;

> +    unsigned int vectors = 1 << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE);

> +    bool new_enabled = val & PCI_MSI_FLAGS_ENABLE;

> +

> +    if ( vectors > msi->max_vectors )

> +        vectors = msi->max_vectors;

> +

> +    /*

> +     * No change in the enable field and the number of vectors is

> +     * the same or the device is not enabled, in which case the

> +     * vectors field can be updated directly.

> +     */

> +    if ( new_enabled == msi->enabled &&

> +         (vectors == msi->vectors || !msi->enabled) )

> +    {

> +        msi->vectors = vectors;

> +        return;

> +    }

> +

> +    if ( new_enabled )

> +    {

> +        /*

> +         * If the device is already enabled it means the number of

> +         * enabled messages has changed. Disable and re-enable the

> +         * device in order to apply the change.

> +         */

> +        if ( msi->enabled && vpci_msi_disable(pdev, msi) )

> +            /*

> +             * Somehow Xen has not been able to disable the

> +             * configured MSI messages, leave the device state as-is,

> +             * so that the guest can try to disable MSI again.

> +             */

> +            return;

> +

> +        vpci_msi_enable(pdev, msi, vectors);

> +    }

> +    else

> +        vpci_msi_disable(pdev, msi);

> +

> +    msi->vectors = vectors;

> +}

> +

> +/* Handlers for the address field (32bit or low part of a 64bit address). */

> +static uint32_t vpci_msi_address_read(struct pci_dev *pdev, unsigned int

> reg,

> +                                      const void *data)

> +{

> +    const struct vpci_msi *msi = data;

> +

> +    return msi->address;

> +}

> +

> +static void vpci_msi_address_write(struct pci_dev *pdev, unsigned int reg,

> +                                   uint32_t val, void *data)

> +{

> +    struct vpci_msi *msi = data;

> +

> +    /* Clear low part. */

> +    msi->address &= ~0xffffffffull;

> +    msi->address |= val;

> +}

> +

> +/* Handlers for the high part of a 64bit address field. */

> +static uint32_t vpci_msi_address_upper_read(struct pci_dev *pdev,

> +                                            unsigned int reg,

> +                                            const void *data)

> +{

> +    const struct vpci_msi *msi = data;

> +

> +    return msi->address >> 32;

> +}

> +

> +static void vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned

> int reg,

> +                                         uint32_t val, void *data)

> +{

> +    struct vpci_msi *msi = data;

> +

> +    /* Clear high part. */

> +    msi->address &= 0xffffffff;

> +    msi->address |= (uint64_t)val << 32;

> +}

> +

> +/* Handlers for the data field. */

> +static uint32_t vpci_msi_data_read(struct pci_dev *pdev, unsigned int reg,

> +                                   const void *data)

> +{

> +    const struct vpci_msi *msi = data;

> +

> +    return msi->data;

> +}

> +

> +static void vpci_msi_data_write(struct pci_dev *pdev, unsigned int reg,

> +                                uint32_t val, void *data)

> +{

> +    struct vpci_msi *msi = data;

> +

> +    msi->data = val;

> +}

> +

> +/* Handlers for the MSI mask bits. */

> +static uint32_t vpci_msi_mask_read(struct pci_dev *pdev, unsigned int reg,

> +                                   const void *data)

> +{

> +    const struct vpci_msi *msi = data;

> +

> +    return msi->mask;

> +}

> +

> +static void vpci_msi_mask_write(struct pci_dev *pdev, unsigned int reg,

> +                                uint32_t val, void *data)

> +{

> +    struct vpci_msi *msi = data;

> +    uint32_t dmask;

> +

> +    dmask = msi->mask ^ val;

> +

> +    if ( !dmask )

> +        return;

> +

> +    if ( msi->enabled )

> +    {

> +        unsigned int i;

> +

> +        for ( i = ffs(dmask) - 1; dmask && i < msi->vectors;

> +              i = ffs(dmask) - 1 )

> +        {

> +            vpci_msi_arch_mask(&msi->arch, pdev, i, (val >> i) & 1);

> +            __clear_bit(i, &dmask);

> +        }

> +    }

> +

> +    msi->mask = val;

> +}

> +

> +static int vpci_init_msi(struct pci_dev *pdev)

> +{

> +    uint8_t seg = pdev->seg, bus = pdev->bus;

> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);

> +    struct vpci_msi *msi;

> +    unsigned int pos;

> +    uint16_t control;

> +    int ret;

> +

> +    pos = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);

> +    if ( !pos )

> +        return 0;

> +

> +    msi = xzalloc(struct vpci_msi);

> +    if ( !msi )

> +        return -ENOMEM;

> +

> +    msi->pos = pos;

> +

> +    ret = vpci_add_register(pdev, vpci_msi_control_read,

> +                            vpci_msi_control_write,

> +                            msi_control_reg(pos), 2, msi);

> +    if ( ret )

> +    {

> +        xfree(msi);

> +        return ret;

> +    }

> +

> +    /* Get the maximum number of vectors the device supports. */

> +    control = pci_conf_read16(seg, bus, slot, func, msi_control_reg(pos));

> +    msi->max_vectors = multi_msi_capable(control);

> +    ASSERT(msi->max_vectors <= 32);

> +

> +    /* The multiple message enable is 0 after reset (1 message enabled). */

> +    msi->vectors = 1;

> +

> +    /* No PIRQ bound yet. */

> +    vpci_msi_arch_init(&msi->arch);

> +

> +    msi->address64 = is_64bit_address(control) ? true : false;

> +    msi->masking = is_mask_bit_support(control) ? true : false;

> +

> +    ret = vpci_add_register(pdev, vpci_msi_address_read,

> +                            vpci_msi_address_write,

> +                            msi_lower_address_reg(pos), 4, msi);

> +    if ( ret )

> +    {

> +        xfree(msi);

> +        return ret;

> +    }

> +

> +    ret = vpci_add_register(pdev, vpci_msi_data_read, vpci_msi_data_write,

> +                            msi_data_reg(pos, msi->address64), 2,

> +                            msi);

> +    if ( ret )

> +    {

> +        xfree(msi);

> +        return ret;

> +    }

> +

> +    if ( msi->address64 )

> +    {

> +        ret = vpci_add_register(pdev, vpci_msi_address_upper_read,

> +                                vpci_msi_address_upper_write,

> +                                msi_upper_address_reg(pos), 4, msi);

> +        if ( ret )

> +        {

> +            xfree(msi);

> +            return ret;

> +        }

> +    }

> +

> +    if ( msi->masking )

> +    {

> +        ret = vpci_add_register(pdev, vpci_msi_mask_read,

> vpci_msi_mask_write,

> +                                msi_mask_bits_reg(pos, msi->address64), 4,

> +                                msi);

> +        if ( ret )

> +        {

> +            xfree(msi);

> +            return ret;

> +        }

> +    }

> +

> +    pdev->vpci->msi = msi;

> +

> +    return 0;

> +}

> +

> +REGISTER_VPCI_INIT(vpci_init_msi);

> +

> +void vpci_dump_msi(void)

> +{

> +    struct domain *d;

> +

> +    for_each_domain ( d )

> +    {

> +        const struct pci_dev *pdev;

> +

> +        if ( !has_vpci(d) )

> +            continue;

> +

> +        printk("vPCI MSI information for d%d\n", d->domain_id);

> +

> +        if ( !vpci_tryrlock(d) )

> +        {

> +            printk("Unable to get vPCI lock, skipping\n");

> +            continue;

> +        }

> +

> +        list_for_each_entry ( pdev, &d->arch.pdev_list, domain_list )

> +        {

> +            uint8_t seg = pdev->seg, bus = pdev->bus;

> +            uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev-

> >devfn);

> +            const struct vpci_msi *msi = pdev->vpci->msi;

> +

> +            if ( msi )

> +            {

> +                printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);

> +

> +                printk("  Enabled: %u Supports masking: %u 64-bit addresses:

> %u\n",

> +                       msi->enabled, msi->masking, msi->address64);

> +                printk("  Max vectors: %u enabled vectors: %u\n",

> +                       msi->max_vectors, msi->vectors);

> +

> +                vpci_msi_arch_print(&msi->arch, msi->data, msi->address);

> +

> +                if ( msi->masking )

> +                    printk("  mask=%08x\n", msi->mask);

> +            }

> +        }

> +        vpci_runlock(d);

> +    }

> +}

> +

> +/*

> + * Local variables:

> + * mode: C

> + * c-file-style: "BSD"

> + * c-basic-offset: 4

> + * tab-width: 4

> + * indent-tabs-mode: nil

> + * End:

> + */

> diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h

> index 837046026c..b6c5e30b6a 100644

> --- a/xen/include/asm-x86/hvm/io.h

> +++ b/xen/include/asm-x86/hvm/io.h

> @@ -20,6 +20,7 @@

>  #define __ASM_X86_HVM_IO_H__

> 

>  #include <xen/mm.h>

> +#include <xen/pci.h>

>  #include <asm/hvm/vpic.h>

>  #include <asm/hvm/vioapic.h>

>  #include <public/hvm/ioreq.h>

> @@ -126,6 +127,23 @@ void hvm_dpci_eoi(struct domain *d, unsigned int

> guest_irq,

>  void msix_write_completion(struct vcpu *);

>  void msixtbl_init(struct domain *d);

> 

> +/* Arch-specific MSI data for vPCI. */

> +struct vpci_arch_msi {

> +    int pirq;

> +};

> +

> +/* Arch-specific vPCI MSI helpers. */

> +void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,

> +                        unsigned int entry, bool mask);

> +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,

> +                         uint64_t address, uint32_t data,

> +                         unsigned int vectors);

> +int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev

> *pdev,

> +                          unsigned int vectors);

> +void vpci_msi_arch_init(struct vpci_arch_msi *arch);

> +void vpci_msi_arch_print(const struct vpci_arch_msi *arch, uint16_t data,

> +                         uint64_t addr);

> +

>  enum stdvga_cache_state {

>      STDVGA_CACHE_UNINITIALIZED,

>      STDVGA_CACHE_ENABLED,

> diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h

> index 37d37b820e..43ab5c6bc6 100644

> --- a/xen/include/asm-x86/msi.h

> +++ b/xen/include/asm-x86/msi.h

> @@ -48,6 +48,7 @@

>  #define MSI_ADDR_REDIRECTION_SHIFT  3

>  #define MSI_ADDR_REDIRECTION_CPU    (0 <<

> MSI_ADDR_REDIRECTION_SHIFT)

>  #define MSI_ADDR_REDIRECTION_LOWPRI (1 <<

> MSI_ADDR_REDIRECTION_SHIFT)

> +#define MSI_ADDR_REDIRECTION_MASK   (1 <<

> MSI_ADDR_REDIRECTION_SHIFT)

> 

>  #define MSI_ADDR_DEST_ID_SHIFT		12

>  #define	 MSI_ADDR_DEST_ID_MASK		0x00ff000

> diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h

> index 0d2c72c109..d07185a479 100644

> --- a/xen/include/xen/hvm/irq.h

> +++ b/xen/include/xen/hvm/irq.h

> @@ -57,7 +57,9 @@ struct dev_intx_gsi_link {

>  #define VMSI_DELIV_MASK   0x7000

>  #define VMSI_TRIG_MODE    0x8000

> 

> +#define GFLAGS_SHIFT_DEST_ID        0

>  #define GFLAGS_SHIFT_RH             8

> +#define GFLAGS_SHIFT_DM             9

>  #define GFLAGS_SHIFT_DELIV_MODE     12

>  #define GFLAGS_SHIFT_TRG_MODE       15

> 

> diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h

> index 0aa817e266..9b10ffa4c3 100644

> --- a/xen/include/xen/irq.h

> +++ b/xen/include/xen/irq.h

> @@ -133,6 +133,7 @@ struct pirq {

>      struct arch_pirq arch;

>  };

> 

> +#define INVALID_PIRQ -1

>  #define pirq_info(d, p) ((struct pirq *)radix_tree_lookup(&(d)->pirq_tree,

> p))

> 

>  /* Use this instead of pirq_info() if the structure may need allocating. */

> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h

> index 3c6beaaf4a..21da73df16 100644

> --- a/xen/include/xen/vpci.h

> +++ b/xen/include/xen/vpci.h

> @@ -13,6 +13,7 @@

>   * of just returning whether the lock is hold by any CPU).

>   */

>  #define vpci_rlock(d) read_lock(&(d)->arch.hvm_domain.vpci_lock)

> +#define vpci_tryrlock(d) read_trylock(&(d)->arch.hvm_domain.vpci_lock)

>  #define vpci_wlock(d) write_lock(&(d)->arch.hvm_domain.vpci_lock)

>  #define vpci_runlock(d) read_unlock(&(d)->arch.hvm_domain.vpci_lock)

>  #define vpci_wunlock(d) write_unlock(&(d)->arch.hvm_domain.vpci_lock)

> @@ -93,9 +94,35 @@ struct vpci {

>          } bars[7]; /* At most 6 BARS + 1 expansion ROM BAR. */

>          /* FIXME: currently there's no support for SR-IOV. */

>      } header;

> +

> +    /* MSI data. */

> +    struct vpci_msi {

> +        /* Arch-specific data. */

> +        struct vpci_arch_msi arch;

> +        /* Address. */

> +        uint64_t address;

> +        /* Offset of the capability in the config space. */

> +        unsigned int pos;

> +        /* Maximum number of vectors supported by the device. */

> +        unsigned int max_vectors;

> +        /* Number of vectors configured. */

> +        unsigned int vectors;

> +        /* Mask bitfield. */

> +        uint32_t mask;

> +        /* Data. */

> +        uint16_t data;

> +        /* Enabled? */

> +        bool enabled;

> +        /* Supports per-vector masking? */

> +        bool masking;

> +        /* 64-bit address capable? */

> +        bool address64;

> +    } *msi;

>  #endif

>  };

> 

> +void vpci_dump_msi(void);

> +

>  #endif

> 

>  /*

> --

> 2.11.0 (Apple Git-81)
Jan Beulich Sept. 7, 2017, 3:29 p.m. UTC | #2
>>> On 14.08.17 at 16:28, <roger.pau@citrix.com> wrote:
> +static unsigned int msi_flags(uint16_t data, uint64_t addr)
> +{
> +    unsigned int rh, dm, dest_id, deliv_mode, trig_mode;
> +
> +    rh = MASK_EXTR(addr, MSI_ADDR_REDIRECTION_MASK);
> +    dm = MASK_EXTR(addr, MSI_ADDR_DESTMODE_MASK);
> +    dest_id = MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK);
> +    deliv_mode = MASK_EXTR(data, MSI_DATA_DELIVERY_MODE_MASK);
> +    trig_mode = MASK_EXTR(data, MSI_DATA_TRIGGER_MASK);
> +
> +    return (dest_id << GFLAGS_SHIFT_DEST_ID) | (rh << GFLAGS_SHIFT_RH) |
> +           (dm << GFLAGS_SHIFT_DM) | (deliv_mode << GFLAGS_SHIFT_DELIV_MODE) |
> +           (trig_mode << GFLAGS_SHIFT_TRG_MODE);

This would need re-basing over the removal of those GFLAGS_*
values anyway, but do you really mean those? There's no domctl
involved here I think, and hence I think you rather mean the x86
architecture mandated macros found in asm-x86/msi.h. Or wait,
no, I've just found the caller - you instead want to name the
function msi_gflags() and perhaps add comment on why you
need to use the domctl constants here.

> +void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,

Presumably constification of pdev here and elsewhere will come as
a result of doing so at the callback layer. If not, helper functions
not needing to alter it should have it const nevertheless.

> +                        unsigned int entry, bool mask)
> +{
> +    struct domain *d = pdev->domain;

This one probably can be const too.

> +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
> +                         uint64_t address, uint32_t data, unsigned int vectors)
> +{
> +    struct msi_info msi_info = {
> +        .seg = pdev->seg,
> +        .bus = pdev->bus,
> +        .devfn = pdev->devfn,
> +        .entry_nr = vectors,
> +    };
> +    unsigned int i;
> +    int rc;
> +
> +    ASSERT(arch->pirq == INVALID_PIRQ);
> +
> +    /* Get a PIRQ. */
> +    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
> +                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
> +    if ( rc )
> +    {
> +        gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
> +                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                 PCI_FUNC(pdev->devfn), rc);
> +        return rc;
> +    }
> +
> +    for ( i = 0; i < vectors; i++ )
> +    {
> +        xen_domctl_bind_pt_irq_t bind = {
> +            .machine_irq = arch->pirq + i,
> +            .irq_type = PT_IRQ_TYPE_MSI,
> +            .u.msi.gvec = msi_vector(data) + i,

Isn't that rather msi_vector(data + i), i.e. wouldn't you better
increment data together with i?

> +            .u.msi.gflags = msi_flags(data, address),
> +        };
> +
> +        pcidevs_lock();
> +        rc = pt_irq_create_bind(pdev->domain, &bind);
> +        if ( rc )
> +        {
> +            gdprintk(XENLOG_ERR,
> +                     "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
> +                     pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                     PCI_FUNC(pdev->devfn), arch->pirq + i, rc);
> +            while ( bind.machine_irq-- )
> +                pt_irq_destroy_bind(pdev->domain, &bind);
> +            spin_lock(&pdev->domain->event_lock);
> +            unmap_domain_pirq(pdev->domain, arch->pirq);
> +            spin_unlock(&pdev->domain->event_lock);
> +            pcidevs_unlock();
> +            arch->pirq = -1;

INVALID_PIRQ ?

> +int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
> +                          unsigned int vectors)
> +{
> +    unsigned int i;
> +
> +    ASSERT(arch->pirq != INVALID_PIRQ);
> +
> +    pcidevs_lock();
> +    for ( i = 0; i < vectors; i++ )
> +    {
> +        xen_domctl_bind_pt_irq_t bind = {
> +            .machine_irq = arch->pirq + i,
> +            .irq_type = PT_IRQ_TYPE_MSI,
> +        };
> +        int rc;
> +
> +        rc = pt_irq_destroy_bind(pdev->domain, &bind);
> +        gdprintk(XENLOG_ERR,
> +                 "%04x:%02x:%02x.%u: failed to unbind PIRQ %u: %d\n",
> +                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                 PCI_FUNC(pdev->devfn), arch->pirq + i, rc);

Couldn't this be ASSERT(!rc)? Afaics all error paths in that function
are due to passing in invalid state or information.

> +static int vpci_msi_disable(struct pci_dev *pdev, struct vpci_msi *msi)
> +{
> +    int ret;
> +
> +    ASSERT(msi->enabled);
> +    __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> +                     PCI_FUNC(pdev->devfn), msi->pos, 0);
> +
> +    ret = vpci_msi_arch_disable(&msi->arch, pdev, msi->vectors);
> +    if ( ret )
> +        return ret;
> +
> +    msi->enabled = false;
> +
> +    return 0;
> +}

Please invert the if() condition and have both cases reach the final
return.

> +static void vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
> +                                   uint32_t val, void *data)
> +{
> +    struct vpci_msi *msi = data;
> +    unsigned int vectors = 1 << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE);
> +    bool new_enabled = val & PCI_MSI_FLAGS_ENABLE;
> +
> +    if ( vectors > msi->max_vectors )
> +        vectors = msi->max_vectors;
> +
> +    /*
> +     * No change in the enable field and the number of vectors is

s/ in / if / ?

> +static int vpci_init_msi(struct pci_dev *pdev)
> +{
> +    uint8_t seg = pdev->seg, bus = pdev->bus;
> +    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +    struct vpci_msi *msi;
> +    unsigned int pos;
> +    uint16_t control;
> +    int ret;
> +
> +    pos = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
> +    if ( !pos )
> +        return 0;
> +
> +    msi = xzalloc(struct vpci_msi);
> +    if ( !msi )
> +        return -ENOMEM;
> +
> +    msi->pos = pos;
> +
> +    ret = vpci_add_register(pdev, vpci_msi_control_read,
> +                            vpci_msi_control_write,
> +                            msi_control_reg(pos), 2, msi);
> +    if ( ret )
> +    {
> +        xfree(msi);
> +        return ret;
> +    }
> +
> +    /* Get the maximum number of vectors the device supports. */
> +    control = pci_conf_read16(seg, bus, slot, func, msi_control_reg(pos));
> +    msi->max_vectors = multi_msi_capable(control);
> +    ASSERT(msi->max_vectors <= 32);
> +
> +    /* The multiple message enable is 0 after reset (1 message enabled). */
> +    msi->vectors = 1;
> +
> +    /* No PIRQ bound yet. */
> +    vpci_msi_arch_init(&msi->arch);

I think it is generally a good idea to hand the entire thing to the
arch hoot, not just the ->arch portion. The hook may want to
read other fields in order to establish the per-arch ones. Just
think of something depending on the maximum vector count.

> +    msi->address64 = is_64bit_address(control) ? true : false;
> +    msi->masking = is_mask_bit_support(control) ? true : false;

What are these conditional operators good for?

> +}
> +
> +REGISTER_VPCI_INIT(vpci_init_msi);

Btw, it's certainly a matter of taste, but would you mind leaving out
the blank line between the function and such registration macro
invocations? We commonly (but not fully uniformly) do so e.g. for
custom_param() invocations, too.

> +void vpci_dump_msi(void)
> +{
> +    struct domain *d;
> +
> +    for_each_domain ( d )
> +    {
> +        const struct pci_dev *pdev;
> +
> +        if ( !has_vpci(d) )
> +            continue;
> +
> +        printk("vPCI MSI information for d%d\n", d->domain_id);
> +
> +        if ( !vpci_tryrlock(d) )
> +        {
> +            printk("Unable to get vPCI lock, skipping\n");
> +            continue;
> +        }
> +
> +        list_for_each_entry ( pdev, &d->arch.pdev_list, domain_list )
> +        {
> +            uint8_t seg = pdev->seg, bus = pdev->bus;
> +            uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
> +            const struct vpci_msi *msi = pdev->vpci->msi;
> +
> +            if ( msi )
> +            {
> +                printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
> +
> +                printk("  Enabled: %u Supports masking: %u 64-bit addresses: %u\n",
> +                       msi->enabled, msi->masking, msi->address64);
> +                printk("  Max vectors: %u enabled vectors: %u\n",
> +                       msi->max_vectors, msi->vectors);
> +
> +                vpci_msi_arch_print(&msi->arch, msi->data, msi->address);
> +
> +                if ( msi->masking )
> +                    printk("  mask=%08x\n", msi->mask);
> +            }
> +        }
> +        vpci_runlock(d);
> +    }
> +}

There no process_pending_softirqs() in here, despite the nested
loops potentially being long lasting.

> --- a/xen/include/xen/irq.h
> +++ b/xen/include/xen/irq.h
> @@ -133,6 +133,7 @@ struct pirq {
>      struct arch_pirq arch;
>  };
>  
> +#define INVALID_PIRQ -1

This needs parentheses.

Jan
Roger Pau Monné Sept. 14, 2017, 10:08 a.m. UTC | #3
On Thu, Sep 07, 2017 at 09:29:41AM -0600, Jan Beulich wrote:
> >>> On 14.08.17 at 16:28, <roger.pau@citrix.com> wrote:
> > +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
> > +                         uint64_t address, uint32_t data, unsigned int vectors)
> > +{
> > +    struct msi_info msi_info = {
> > +        .seg = pdev->seg,
> > +        .bus = pdev->bus,
> > +        .devfn = pdev->devfn,
> > +        .entry_nr = vectors,
> > +    };
> > +    unsigned int i;
> > +    int rc;
> > +
> > +    ASSERT(arch->pirq == INVALID_PIRQ);
> > +
> > +    /* Get a PIRQ. */
> > +    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
> > +                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
> > +    if ( rc )
> > +    {
> > +        gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
> > +                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> > +                 PCI_FUNC(pdev->devfn), rc);
> > +        return rc;
> > +    }
> > +
> > +    for ( i = 0; i < vectors; i++ )
> > +    {
> > +        xen_domctl_bind_pt_irq_t bind = {
> > +            .machine_irq = arch->pirq + i,
> > +            .irq_type = PT_IRQ_TYPE_MSI,
> > +            .u.msi.gvec = msi_vector(data) + i,
> 
> Isn't that rather msi_vector(data + i), i.e. wouldn't you better
> increment data together with i?

That's true, because the vector is fetched from the last 8bits of the
data, but I find it more confusing (and it requires that the reader
knows this detail). IMHO I would prefer to leave it as-is.

Thanks, Roger.
Jan Beulich Sept. 14, 2017, 10:19 a.m. UTC | #4
>>> On 14.09.17 at 12:08, <roger.pau@citrix.com> wrote:
> On Thu, Sep 07, 2017 at 09:29:41AM -0600, Jan Beulich wrote:
>> >>> On 14.08.17 at 16:28, <roger.pau@citrix.com> wrote:
>> > +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
>> > +                         uint64_t address, uint32_t data, unsigned int vectors)
>> > +{
>> > +    struct msi_info msi_info = {
>> > +        .seg = pdev->seg,
>> > +        .bus = pdev->bus,
>> > +        .devfn = pdev->devfn,
>> > +        .entry_nr = vectors,
>> > +    };
>> > +    unsigned int i;
>> > +    int rc;
>> > +
>> > +    ASSERT(arch->pirq == INVALID_PIRQ);
>> > +
>> > +    /* Get a PIRQ. */
>> > +    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
>> > +                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
>> > +    if ( rc )
>> > +    {
>> > +        gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
>> > +                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> > +                 PCI_FUNC(pdev->devfn), rc);
>> > +        return rc;
>> > +    }
>> > +
>> > +    for ( i = 0; i < vectors; i++ )
>> > +    {
>> > +        xen_domctl_bind_pt_irq_t bind = {
>> > +            .machine_irq = arch->pirq + i,
>> > +            .irq_type = PT_IRQ_TYPE_MSI,
>> > +            .u.msi.gvec = msi_vector(data) + i,
>> 
>> Isn't that rather msi_vector(data + i), i.e. wouldn't you better
>> increment data together with i?
> 
> That's true, because the vector is fetched from the last 8bits of the
> data, but I find it more confusing (and it requires that the reader
> knows this detail). IMHO I would prefer to leave it as-is.

No, the problem is the wrap-around case, which your code
doesn't handle. Iirc hardware behaves along the lines of what
I've suggested to change to, with potentially the vector
increment carrying into other parts of the value. Hence you
either need an early check for there not being any wrapping,
or other places may need similar adjustment (in which case it
might be better to really just increment "data" once in the
loop.

Jan
Roger Pau Monné Sept. 14, 2017, 10:42 a.m. UTC | #5
On Thu, Sep 14, 2017 at 04:19:44AM -0600, Jan Beulich wrote:
> >>> On 14.09.17 at 12:08, <roger.pau@citrix.com> wrote:
> > On Thu, Sep 07, 2017 at 09:29:41AM -0600, Jan Beulich wrote:
> >> >>> On 14.08.17 at 16:28, <roger.pau@citrix.com> wrote:
> >> > +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
> >> > +                         uint64_t address, uint32_t data, unsigned int vectors)
> >> > +{
> >> > +    struct msi_info msi_info = {
> >> > +        .seg = pdev->seg,
> >> > +        .bus = pdev->bus,
> >> > +        .devfn = pdev->devfn,
> >> > +        .entry_nr = vectors,
> >> > +    };
> >> > +    unsigned int i;
> >> > +    int rc;
> >> > +
> >> > +    ASSERT(arch->pirq == INVALID_PIRQ);
> >> > +
> >> > +    /* Get a PIRQ. */
> >> > +    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
> >> > +                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
> >> > +    if ( rc )
> >> > +    {
> >> > +        gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
> >> > +                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> >> > +                 PCI_FUNC(pdev->devfn), rc);
> >> > +        return rc;
> >> > +    }
> >> > +
> >> > +    for ( i = 0; i < vectors; i++ )
> >> > +    {
> >> > +        xen_domctl_bind_pt_irq_t bind = {
> >> > +            .machine_irq = arch->pirq + i,
> >> > +            .irq_type = PT_IRQ_TYPE_MSI,
> >> > +            .u.msi.gvec = msi_vector(data) + i,
> >> 
> >> Isn't that rather msi_vector(data + i), i.e. wouldn't you better
> >> increment data together with i?
> > 
> > That's true, because the vector is fetched from the last 8bits of the
> > data, but I find it more confusing (and it requires that the reader
> > knows this detail). IMHO I would prefer to leave it as-is.
> 
> No, the problem is the wrap-around case, which your code
> doesn't handle. Iirc hardware behaves along the lines of what
> I've suggested to change to, with potentially the vector
> increment carrying into other parts of the value. Hence you
> either need an early check for there not being any wrapping,
> or other places may need similar adjustment (in which case it
> might be better to really just increment "data" once in the
> loop.

Oh, so the vector increment carries over to the delivery mode, then I
will switch it.

Thanks, Roger.
Jan Beulich Sept. 14, 2017, 10:50 a.m. UTC | #6
>>> On 14.09.17 at 12:42, <roger.pau@citrix.com> wrote:
> On Thu, Sep 14, 2017 at 04:19:44AM -0600, Jan Beulich wrote:
>> >>> On 14.09.17 at 12:08, <roger.pau@citrix.com> wrote:
>> > On Thu, Sep 07, 2017 at 09:29:41AM -0600, Jan Beulich wrote:
>> >> >>> On 14.08.17 at 16:28, <roger.pau@citrix.com> wrote:
>> >> > +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev 
> *pdev,
>> >> > +                         uint64_t address, uint32_t data, unsigned int 
> vectors)
>> >> > +{
>> >> > +    struct msi_info msi_info = {
>> >> > +        .seg = pdev->seg,
>> >> > +        .bus = pdev->bus,
>> >> > +        .devfn = pdev->devfn,
>> >> > +        .entry_nr = vectors,
>> >> > +    };
>> >> > +    unsigned int i;
>> >> > +    int rc;
>> >> > +
>> >> > +    ASSERT(arch->pirq == INVALID_PIRQ);
>> >> > +
>> >> > +    /* Get a PIRQ. */
>> >> > +    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
>> >> > +                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
>> >> > +    if ( rc )
>> >> > +    {
>> >> > +        gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: 
> %d\n",
>> >> > +                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> >> > +                 PCI_FUNC(pdev->devfn), rc);
>> >> > +        return rc;
>> >> > +    }
>> >> > +
>> >> > +    for ( i = 0; i < vectors; i++ )
>> >> > +    {
>> >> > +        xen_domctl_bind_pt_irq_t bind = {
>> >> > +            .machine_irq = arch->pirq + i,
>> >> > +            .irq_type = PT_IRQ_TYPE_MSI,
>> >> > +            .u.msi.gvec = msi_vector(data) + i,
>> >> 
>> >> Isn't that rather msi_vector(data + i), i.e. wouldn't you better
>> >> increment data together with i?
>> > 
>> > That's true, because the vector is fetched from the last 8bits of the
>> > data, but I find it more confusing (and it requires that the reader
>> > knows this detail). IMHO I would prefer to leave it as-is.
>> 
>> No, the problem is the wrap-around case, which your code
>> doesn't handle. Iirc hardware behaves along the lines of what
>> I've suggested to change to, with potentially the vector
>> increment carrying into other parts of the value. Hence you
>> either need an early check for there not being any wrapping,
>> or other places may need similar adjustment (in which case it
>> might be better to really just increment "data" once in the
>> loop.
> 
> Oh, so the vector increment carries over to the delivery mode, then I
> will switch it.

But please double check first that I'm not mis-remembering.

Jan
Roger Pau Monné Sept. 14, 2017, 11:35 a.m. UTC | #7
On Thu, Sep 14, 2017 at 04:50:10AM -0600, Jan Beulich wrote:
> >>> On 14.09.17 at 12:42, <roger.pau@citrix.com> wrote:
> > On Thu, Sep 14, 2017 at 04:19:44AM -0600, Jan Beulich wrote:
> >> >>> On 14.09.17 at 12:08, <roger.pau@citrix.com> wrote:
> >> > On Thu, Sep 07, 2017 at 09:29:41AM -0600, Jan Beulich wrote:
> >> >> >>> On 14.08.17 at 16:28, <roger.pau@citrix.com> wrote:
> >> >> > +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev 
> > *pdev,
> >> >> > +                         uint64_t address, uint32_t data, unsigned int 
> > vectors)
> >> >> > +{
> >> >> > +    struct msi_info msi_info = {
> >> >> > +        .seg = pdev->seg,
> >> >> > +        .bus = pdev->bus,
> >> >> > +        .devfn = pdev->devfn,
> >> >> > +        .entry_nr = vectors,
> >> >> > +    };
> >> >> > +    unsigned int i;
> >> >> > +    int rc;
> >> >> > +
> >> >> > +    ASSERT(arch->pirq == INVALID_PIRQ);
> >> >> > +
> >> >> > +    /* Get a PIRQ. */
> >> >> > +    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
> >> >> > +                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
> >> >> > +    if ( rc )
> >> >> > +    {
> >> >> > +        gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: 
> > %d\n",
> >> >> > +                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
> >> >> > +                 PCI_FUNC(pdev->devfn), rc);
> >> >> > +        return rc;
> >> >> > +    }
> >> >> > +
> >> >> > +    for ( i = 0; i < vectors; i++ )
> >> >> > +    {
> >> >> > +        xen_domctl_bind_pt_irq_t bind = {
> >> >> > +            .machine_irq = arch->pirq + i,
> >> >> > +            .irq_type = PT_IRQ_TYPE_MSI,
> >> >> > +            .u.msi.gvec = msi_vector(data) + i,
> >> >> 
> >> >> Isn't that rather msi_vector(data + i), i.e. wouldn't you better
> >> >> increment data together with i?
> >> > 
> >> > That's true, because the vector is fetched from the last 8bits of the
> >> > data, but I find it more confusing (and it requires that the reader
> >> > knows this detail). IMHO I would prefer to leave it as-is.
> >> 
> >> No, the problem is the wrap-around case, which your code
> >> doesn't handle. Iirc hardware behaves along the lines of what
> >> I've suggested to change to, with potentially the vector
> >> increment carrying into other parts of the value. Hence you
> >> either need an early check for there not being any wrapping,
> >> or other places may need similar adjustment (in which case it
> >> might be better to really just increment "data" once in the
> >> loop.
> > 
> > Oh, so the vector increment carries over to the delivery mode, then I
> > will switch it.
> 
> But please double check first that I'm not mis-remembering.

The PCI spec contains the following about the data field:

The Multiple Message Enable field (bits 6-4 of the Message Control
register) defines the number of low order message data bits the
function is permitted to modify to generate its system software
allocated vectors. For example, a Multiple Message Enable encoding of
“010” indicates the function has been allocated four vectors and is
permitted to modify message data bits 1 and 0 (a function modifies the
lower message data bits to generate the allocated number of vectors).
If the Multiple Message Enable field is “000”, the function is not
permitted to modify the message data.

So it seems like the overflow should be limited to the number of
enabled vectors, ie: maybe store the vector in a uint8_t and increase
it at every loop?

I don't seem to be able to find any specific mention to what happens
when the vector part of the data register overflows. From the except
above it seems like it should never modify any other bits apart from
the low 8bit vector ones.

The Intel SDM mentions that the vector must be in the range 0x10-0xfe,
in which case the resulting vector (whether we wrap or not) is not
going to be valid anyway.

Thanks, Roger.
Jan Beulich Sept. 14, 2017, 12:09 p.m. UTC | #8
>>> On 14.09.17 at 13:35, <roger.pau@citrix.com> wrote:
> On Thu, Sep 14, 2017 at 04:50:10AM -0600, Jan Beulich wrote:
>> >>> On 14.09.17 at 12:42, <roger.pau@citrix.com> wrote:
>> > On Thu, Sep 14, 2017 at 04:19:44AM -0600, Jan Beulich wrote:
>> >> >>> On 14.09.17 at 12:08, <roger.pau@citrix.com> wrote:
>> >> > On Thu, Sep 07, 2017 at 09:29:41AM -0600, Jan Beulich wrote:
>> >> >> >>> On 14.08.17 at 16:28, <roger.pau@citrix.com> wrote:
>> >> >> > +int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev 
>> > *pdev,
>> >> >> > +                         uint64_t address, uint32_t data, unsigned int 
>> > vectors)
>> >> >> > +{
>> >> >> > +    struct msi_info msi_info = {
>> >> >> > +        .seg = pdev->seg,
>> >> >> > +        .bus = pdev->bus,
>> >> >> > +        .devfn = pdev->devfn,
>> >> >> > +        .entry_nr = vectors,
>> >> >> > +    };
>> >> >> > +    unsigned int i;
>> >> >> > +    int rc;
>> >> >> > +
>> >> >> > +    ASSERT(arch->pirq == INVALID_PIRQ);
>> >> >> > +
>> >> >> > +    /* Get a PIRQ. */
>> >> >> > +    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
>> >> >> > +                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
>> >> >> > +    if ( rc )
>> >> >> > +    {
>> >> >> > +        gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: 
>> > %d\n",
>> >> >> > +                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
>> >> >> > +                 PCI_FUNC(pdev->devfn), rc);
>> >> >> > +        return rc;
>> >> >> > +    }
>> >> >> > +
>> >> >> > +    for ( i = 0; i < vectors; i++ )
>> >> >> > +    {
>> >> >> > +        xen_domctl_bind_pt_irq_t bind = {
>> >> >> > +            .machine_irq = arch->pirq + i,
>> >> >> > +            .irq_type = PT_IRQ_TYPE_MSI,
>> >> >> > +            .u.msi.gvec = msi_vector(data) + i,
>> >> >> 
>> >> >> Isn't that rather msi_vector(data + i), i.e. wouldn't you better
>> >> >> increment data together with i?
>> >> > 
>> >> > That's true, because the vector is fetched from the last 8bits of the
>> >> > data, but I find it more confusing (and it requires that the reader
>> >> > knows this detail). IMHO I would prefer to leave it as-is.
>> >> 
>> >> No, the problem is the wrap-around case, which your code
>> >> doesn't handle. Iirc hardware behaves along the lines of what
>> >> I've suggested to change to, with potentially the vector
>> >> increment carrying into other parts of the value. Hence you
>> >> either need an early check for there not being any wrapping,
>> >> or other places may need similar adjustment (in which case it
>> >> might be better to really just increment "data" once in the
>> >> loop.
>> > 
>> > Oh, so the vector increment carries over to the delivery mode, then I
>> > will switch it.
>> 
>> But please double check first that I'm not mis-remembering.
> 
> The PCI spec contains the following about the data field:
> 
> The Multiple Message Enable field (bits 6-4 of the Message Control
> register) defines the number of low order message data bits the
> function is permitted to modify to generate its system software
> allocated vectors. For example, a Multiple Message Enable encoding of
> “010” indicates the function has been allocated four vectors and is
> permitted to modify message data bits 1 and 0 (a function modifies the
> lower message data bits to generate the allocated number of vectors).
> If the Multiple Message Enable field is “000”, the function is not
> permitted to modify the message data.
> 
> So it seems like the overflow should be limited to the number of
> enabled vectors,

Ah, right. So no spilling over into the higher bits.

> ie: maybe store the vector in a uint8_t and increase
> it at every loop?

A uint8_t won't help: As the text says, you need to limit the change
to the number of bits that are permitted to be altered. I think there's
an implication for these bits to be all clear for the first of the vectors,
but that's not spelled out, so I'd prefer if we were flexible and
allowed to start from a non-zero value (resulting in e.g. a
0b00111101, 0b00111110, 0b00111111, 0b00111100 vector
sequence).

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index a36692c313..aea088e290 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -622,3 +622,159 @@  void msix_write_completion(struct vcpu *v)
     if ( msixtbl_write(v, ctrl_address, 4, 0) != X86EMUL_OKAY )
         gdprintk(XENLOG_WARNING, "MSI-X write completion failure\n");
 }
+
+static unsigned int msi_vector(uint16_t data)
+{
+    return MASK_EXTR(data, MSI_DATA_VECTOR_MASK);
+}
+
+static unsigned int msi_flags(uint16_t data, uint64_t addr)
+{
+    unsigned int rh, dm, dest_id, deliv_mode, trig_mode;
+
+    rh = MASK_EXTR(addr, MSI_ADDR_REDIRECTION_MASK);
+    dm = MASK_EXTR(addr, MSI_ADDR_DESTMODE_MASK);
+    dest_id = MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK);
+    deliv_mode = MASK_EXTR(data, MSI_DATA_DELIVERY_MODE_MASK);
+    trig_mode = MASK_EXTR(data, MSI_DATA_TRIGGER_MASK);
+
+    return (dest_id << GFLAGS_SHIFT_DEST_ID) | (rh << GFLAGS_SHIFT_RH) |
+           (dm << GFLAGS_SHIFT_DM) | (deliv_mode << GFLAGS_SHIFT_DELIV_MODE) |
+           (trig_mode << GFLAGS_SHIFT_TRG_MODE);
+}
+
+void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                        unsigned int entry, bool mask)
+{
+    struct domain *d = pdev->domain;
+    const struct pirq *pinfo;
+    struct irq_desc *desc;
+    unsigned long flags;
+    int irq;
+
+    ASSERT(arch->pirq >= 0);
+    pinfo = pirq_info(d, arch->pirq + entry);
+    if ( !pinfo )
+        return;
+
+    irq = pinfo->arch.irq;
+    if ( irq >= nr_irqs || irq < 0)
+        return;
+
+    desc = irq_to_desc(irq);
+    if ( !desc )
+        return;
+
+    spin_lock_irqsave(&desc->lock, flags);
+    guest_mask_msi_irq(desc, mask);
+    spin_unlock_irqrestore(&desc->lock, flags);
+}
+
+int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                         uint64_t address, uint32_t data, unsigned int vectors)
+{
+    struct msi_info msi_info = {
+        .seg = pdev->seg,
+        .bus = pdev->bus,
+        .devfn = pdev->devfn,
+        .entry_nr = vectors,
+    };
+    unsigned int i;
+    int rc;
+
+    ASSERT(arch->pirq == INVALID_PIRQ);
+
+    /* Get a PIRQ. */
+    rc = allocate_and_map_msi_pirq(pdev->domain, -1, &arch->pirq,
+                                   MAP_PIRQ_TYPE_MULTI_MSI, &msi_info);
+    if ( rc )
+    {
+        gdprintk(XENLOG_ERR, "%04x:%02x:%02x.%u: failed to map PIRQ: %d\n",
+                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                 PCI_FUNC(pdev->devfn), rc);
+        return rc;
+    }
+
+    for ( i = 0; i < vectors; i++ )
+    {
+        xen_domctl_bind_pt_irq_t bind = {
+            .machine_irq = arch->pirq + i,
+            .irq_type = PT_IRQ_TYPE_MSI,
+            .u.msi.gvec = msi_vector(data) + i,
+            .u.msi.gflags = msi_flags(data, address),
+        };
+
+        pcidevs_lock();
+        rc = pt_irq_create_bind(pdev->domain, &bind);
+        if ( rc )
+        {
+            gdprintk(XENLOG_ERR,
+                     "%04x:%02x:%02x.%u: failed to bind PIRQ %u: %d\n",
+                     pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                     PCI_FUNC(pdev->devfn), arch->pirq + i, rc);
+            while ( bind.machine_irq-- )
+                pt_irq_destroy_bind(pdev->domain, &bind);
+            spin_lock(&pdev->domain->event_lock);
+            unmap_domain_pirq(pdev->domain, arch->pirq);
+            spin_unlock(&pdev->domain->event_lock);
+            pcidevs_unlock();
+            arch->pirq = -1;
+            return rc;
+        }
+        pcidevs_unlock();
+    }
+
+    return 0;
+}
+
+int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                          unsigned int vectors)
+{
+    unsigned int i;
+
+    ASSERT(arch->pirq != INVALID_PIRQ);
+
+    pcidevs_lock();
+    for ( i = 0; i < vectors; i++ )
+    {
+        xen_domctl_bind_pt_irq_t bind = {
+            .machine_irq = arch->pirq + i,
+            .irq_type = PT_IRQ_TYPE_MSI,
+        };
+        int rc;
+
+        rc = pt_irq_destroy_bind(pdev->domain, &bind);
+        gdprintk(XENLOG_ERR,
+                 "%04x:%02x:%02x.%u: failed to unbind PIRQ %u: %d\n",
+                 pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                 PCI_FUNC(pdev->devfn), arch->pirq + i, rc);
+    }
+
+    spin_lock(&pdev->domain->event_lock);
+    unmap_domain_pirq(pdev->domain, arch->pirq);
+    spin_unlock(&pdev->domain->event_lock);
+    pcidevs_unlock();
+
+    arch->pirq = INVALID_PIRQ;
+
+    return 0;
+}
+
+void vpci_msi_arch_init(struct vpci_arch_msi *arch)
+{
+    arch->pirq = INVALID_PIRQ;
+}
+
+void vpci_msi_arch_print(const struct vpci_arch_msi *arch, uint16_t data,
+                         uint64_t addr)
+{
+    printk("vec=%#02x%7s%6s%3sassert%5s%7s dest_id=%lu pirq: %d\n",
+           MASK_EXTR(data, MSI_DATA_VECTOR_MASK),
+           data & MSI_DATA_DELIVERY_LOWPRI ? "lowest" : "fixed",
+           data & MSI_DATA_TRIGGER_LEVEL ? "level" : "edge",
+           data & MSI_DATA_LEVEL_ASSERT ? "" : "de",
+           addr & MSI_ADDR_DESTMODE_LOGIC ? "log" : "phys",
+           addr & MSI_ADDR_REDIRECTION_LOWPRI ? "lowest" : "fixed",
+           MASK_EXTR(addr, MSI_ADDR_DEST_ID_MASK),
+           arch->pirq);
+}
diff --git a/xen/arch/x86/msi.c b/xen/arch/x86/msi.c
index 77998f4fb3..63769153f1 100644
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -30,6 +30,7 @@ 
 #include <public/physdev.h>
 #include <xen/iommu.h>
 #include <xsm/xsm.h>
+#include <xen/vpci.h>
 
 static s8 __read_mostly use_msi = -1;
 boolean_param("msi", use_msi);
@@ -1536,6 +1537,8 @@  static void dump_msi(unsigned char key)
                attr.guest_masked ? 'G' : ' ',
                mask);
     }
+
+    vpci_dump_msi();
 }
 
 static int __init msi_setup_keyhandler(void)
diff --git a/xen/drivers/vpci/Makefile b/xen/drivers/vpci/Makefile
index 241467212f..62cec9e82b 100644
--- a/xen/drivers/vpci/Makefile
+++ b/xen/drivers/vpci/Makefile
@@ -1 +1 @@ 
-obj-y += vpci.o header.o
+obj-y += vpci.o header.o msi.o
diff --git a/xen/drivers/vpci/msi.c b/xen/drivers/vpci/msi.c
new file mode 100644
index 0000000000..1e36b9779a
--- /dev/null
+++ b/xen/drivers/vpci/msi.c
@@ -0,0 +1,368 @@ 
+/*
+ * Handlers for accesses to the MSI capability structure.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms and conditions of the GNU General Public
+ * License, version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public
+ * License along with this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/sched.h>
+#include <xen/vpci.h>
+
+#include <asm/msi.h>
+
+/* Handlers for the MSI control field (PCI_MSI_FLAGS). */
+static uint32_t vpci_msi_control_read(struct pci_dev *pdev, unsigned int reg,
+                                      const void *data)
+{
+    const struct vpci_msi *msi = data;
+    uint16_t val;
+
+    /* Set the number of supported/configured messages. */
+    val = MASK_INSR(fls(msi->max_vectors) - 1, PCI_MSI_FLAGS_QMASK);
+    val |= MASK_INSR(fls(msi->vectors) - 1, PCI_MSI_FLAGS_QSIZE);
+
+    val |= msi->enabled ? PCI_MSI_FLAGS_ENABLE : 0;
+    val |= msi->masking ? PCI_MSI_FLAGS_MASKBIT : 0;
+    val |= msi->address64 ? PCI_MSI_FLAGS_64BIT : 0;
+
+    return val;
+}
+
+static void vpci_msi_enable(struct pci_dev *pdev, struct vpci_msi *msi,
+                            unsigned int vectors)
+{
+    int ret;
+
+    ASSERT(!msi->enabled);
+    ret = vpci_msi_arch_enable(&msi->arch, pdev, msi->address, msi->data,
+                               vectors);
+    if ( ret )
+        return;
+
+    /* Apply the mask bits. */
+    if ( msi->masking )
+    {
+        unsigned int i;
+        uint32_t mask = msi->mask;
+
+        for ( i = ffs(mask) - 1; mask && i < vectors; i = ffs(mask) - 1 )
+        {
+            vpci_msi_arch_mask(&msi->arch, pdev, i, true);
+            __clear_bit(i, &mask);
+        }
+    }
+
+    __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                     PCI_FUNC(pdev->devfn), msi->pos, 1);
+
+    msi->enabled = true;
+}
+
+static int vpci_msi_disable(struct pci_dev *pdev, struct vpci_msi *msi)
+{
+    int ret;
+
+    ASSERT(msi->enabled);
+    __msi_set_enable(pdev->seg, pdev->bus, PCI_SLOT(pdev->devfn),
+                     PCI_FUNC(pdev->devfn), msi->pos, 0);
+
+    ret = vpci_msi_arch_disable(&msi->arch, pdev, msi->vectors);
+    if ( ret )
+        return ret;
+
+    msi->enabled = false;
+
+    return 0;
+}
+
+static void vpci_msi_control_write(struct pci_dev *pdev, unsigned int reg,
+                                   uint32_t val, void *data)
+{
+    struct vpci_msi *msi = data;
+    unsigned int vectors = 1 << MASK_EXTR(val, PCI_MSI_FLAGS_QSIZE);
+    bool new_enabled = val & PCI_MSI_FLAGS_ENABLE;
+
+    if ( vectors > msi->max_vectors )
+        vectors = msi->max_vectors;
+
+    /*
+     * No change in the enable field and the number of vectors is
+     * the same or the device is not enabled, in which case the
+     * vectors field can be updated directly.
+     */
+    if ( new_enabled == msi->enabled &&
+         (vectors == msi->vectors || !msi->enabled) )
+    {
+        msi->vectors = vectors;
+        return;
+    }
+
+    if ( new_enabled )
+    {
+        /*
+         * If the device is already enabled it means the number of
+         * enabled messages has changed. Disable and re-enable the
+         * device in order to apply the change.
+         */
+        if ( msi->enabled && vpci_msi_disable(pdev, msi) )
+            /*
+             * Somehow Xen has not been able to disable the
+             * configured MSI messages, leave the device state as-is,
+             * so that the guest can try to disable MSI again.
+             */
+            return;
+
+        vpci_msi_enable(pdev, msi, vectors);
+    }
+    else
+        vpci_msi_disable(pdev, msi);
+
+    msi->vectors = vectors;
+}
+
+/* Handlers for the address field (32bit or low part of a 64bit address). */
+static uint32_t vpci_msi_address_read(struct pci_dev *pdev, unsigned int reg,
+                                      const void *data)
+{
+    const struct vpci_msi *msi = data;
+
+    return msi->address;
+}
+
+static void vpci_msi_address_write(struct pci_dev *pdev, unsigned int reg,
+                                   uint32_t val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    /* Clear low part. */
+    msi->address &= ~0xffffffffull;
+    msi->address |= val;
+}
+
+/* Handlers for the high part of a 64bit address field. */
+static uint32_t vpci_msi_address_upper_read(struct pci_dev *pdev,
+                                            unsigned int reg,
+                                            const void *data)
+{
+    const struct vpci_msi *msi = data;
+
+    return msi->address >> 32;
+}
+
+static void vpci_msi_address_upper_write(struct pci_dev *pdev, unsigned int reg,
+                                         uint32_t val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    /* Clear high part. */
+    msi->address &= 0xffffffff;
+    msi->address |= (uint64_t)val << 32;
+}
+
+/* Handlers for the data field. */
+static uint32_t vpci_msi_data_read(struct pci_dev *pdev, unsigned int reg,
+                                   const void *data)
+{
+    const struct vpci_msi *msi = data;
+
+    return msi->data;
+}
+
+static void vpci_msi_data_write(struct pci_dev *pdev, unsigned int reg,
+                                uint32_t val, void *data)
+{
+    struct vpci_msi *msi = data;
+
+    msi->data = val;
+}
+
+/* Handlers for the MSI mask bits. */
+static uint32_t vpci_msi_mask_read(struct pci_dev *pdev, unsigned int reg,
+                                   const void *data)
+{
+    const struct vpci_msi *msi = data;
+
+    return msi->mask;
+}
+
+static void vpci_msi_mask_write(struct pci_dev *pdev, unsigned int reg,
+                                uint32_t val, void *data)
+{
+    struct vpci_msi *msi = data;
+    uint32_t dmask;
+
+    dmask = msi->mask ^ val;
+
+    if ( !dmask )
+        return;
+
+    if ( msi->enabled )
+    {
+        unsigned int i;
+
+        for ( i = ffs(dmask) - 1; dmask && i < msi->vectors;
+              i = ffs(dmask) - 1 )
+        {
+            vpci_msi_arch_mask(&msi->arch, pdev, i, (val >> i) & 1);
+            __clear_bit(i, &dmask);
+        }
+    }
+
+    msi->mask = val;
+}
+
+static int vpci_init_msi(struct pci_dev *pdev)
+{
+    uint8_t seg = pdev->seg, bus = pdev->bus;
+    uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+    struct vpci_msi *msi;
+    unsigned int pos;
+    uint16_t control;
+    int ret;
+
+    pos = pci_find_cap_offset(seg, bus, slot, func, PCI_CAP_ID_MSI);
+    if ( !pos )
+        return 0;
+
+    msi = xzalloc(struct vpci_msi);
+    if ( !msi )
+        return -ENOMEM;
+
+    msi->pos = pos;
+
+    ret = vpci_add_register(pdev, vpci_msi_control_read,
+                            vpci_msi_control_write,
+                            msi_control_reg(pos), 2, msi);
+    if ( ret )
+    {
+        xfree(msi);
+        return ret;
+    }
+
+    /* Get the maximum number of vectors the device supports. */
+    control = pci_conf_read16(seg, bus, slot, func, msi_control_reg(pos));
+    msi->max_vectors = multi_msi_capable(control);
+    ASSERT(msi->max_vectors <= 32);
+
+    /* The multiple message enable is 0 after reset (1 message enabled). */
+    msi->vectors = 1;
+
+    /* No PIRQ bound yet. */
+    vpci_msi_arch_init(&msi->arch);
+
+    msi->address64 = is_64bit_address(control) ? true : false;
+    msi->masking = is_mask_bit_support(control) ? true : false;
+
+    ret = vpci_add_register(pdev, vpci_msi_address_read,
+                            vpci_msi_address_write,
+                            msi_lower_address_reg(pos), 4, msi);
+    if ( ret )
+    {
+        xfree(msi);
+        return ret;
+    }
+
+    ret = vpci_add_register(pdev, vpci_msi_data_read, vpci_msi_data_write,
+                            msi_data_reg(pos, msi->address64), 2,
+                            msi);
+    if ( ret )
+    {
+        xfree(msi);
+        return ret;
+    }
+
+    if ( msi->address64 )
+    {
+        ret = vpci_add_register(pdev, vpci_msi_address_upper_read,
+                                vpci_msi_address_upper_write,
+                                msi_upper_address_reg(pos), 4, msi);
+        if ( ret )
+        {
+            xfree(msi);
+            return ret;
+        }
+    }
+
+    if ( msi->masking )
+    {
+        ret = vpci_add_register(pdev, vpci_msi_mask_read, vpci_msi_mask_write,
+                                msi_mask_bits_reg(pos, msi->address64), 4,
+                                msi);
+        if ( ret )
+        {
+            xfree(msi);
+            return ret;
+        }
+    }
+
+    pdev->vpci->msi = msi;
+
+    return 0;
+}
+
+REGISTER_VPCI_INIT(vpci_init_msi);
+
+void vpci_dump_msi(void)
+{
+    struct domain *d;
+
+    for_each_domain ( d )
+    {
+        const struct pci_dev *pdev;
+
+        if ( !has_vpci(d) )
+            continue;
+
+        printk("vPCI MSI information for d%d\n", d->domain_id);
+
+        if ( !vpci_tryrlock(d) )
+        {
+            printk("Unable to get vPCI lock, skipping\n");
+            continue;
+        }
+
+        list_for_each_entry ( pdev, &d->arch.pdev_list, domain_list )
+        {
+            uint8_t seg = pdev->seg, bus = pdev->bus;
+            uint8_t slot = PCI_SLOT(pdev->devfn), func = PCI_FUNC(pdev->devfn);
+            const struct vpci_msi *msi = pdev->vpci->msi;
+
+            if ( msi )
+            {
+                printk("Device %04x:%02x:%02x.%u\n", seg, bus, slot, func);
+
+                printk("  Enabled: %u Supports masking: %u 64-bit addresses: %u\n",
+                       msi->enabled, msi->masking, msi->address64);
+                printk("  Max vectors: %u enabled vectors: %u\n",
+                       msi->max_vectors, msi->vectors);
+
+                vpci_msi_arch_print(&msi->arch, msi->data, msi->address);
+
+                if ( msi->masking )
+                    printk("  mask=%08x\n", msi->mask);
+            }
+        }
+        vpci_runlock(d);
+    }
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/io.h b/xen/include/asm-x86/hvm/io.h
index 837046026c..b6c5e30b6a 100644
--- a/xen/include/asm-x86/hvm/io.h
+++ b/xen/include/asm-x86/hvm/io.h
@@ -20,6 +20,7 @@ 
 #define __ASM_X86_HVM_IO_H__
 
 #include <xen/mm.h>
+#include <xen/pci.h>
 #include <asm/hvm/vpic.h>
 #include <asm/hvm/vioapic.h>
 #include <public/hvm/ioreq.h>
@@ -126,6 +127,23 @@  void hvm_dpci_eoi(struct domain *d, unsigned int guest_irq,
 void msix_write_completion(struct vcpu *);
 void msixtbl_init(struct domain *d);
 
+/* Arch-specific MSI data for vPCI. */
+struct vpci_arch_msi {
+    int pirq;
+};
+
+/* Arch-specific vPCI MSI helpers. */
+void vpci_msi_arch_mask(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                        unsigned int entry, bool mask);
+int vpci_msi_arch_enable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                         uint64_t address, uint32_t data,
+                         unsigned int vectors);
+int vpci_msi_arch_disable(struct vpci_arch_msi *arch, struct pci_dev *pdev,
+                          unsigned int vectors);
+void vpci_msi_arch_init(struct vpci_arch_msi *arch);
+void vpci_msi_arch_print(const struct vpci_arch_msi *arch, uint16_t data,
+                         uint64_t addr);
+
 enum stdvga_cache_state {
     STDVGA_CACHE_UNINITIALIZED,
     STDVGA_CACHE_ENABLED,
diff --git a/xen/include/asm-x86/msi.h b/xen/include/asm-x86/msi.h
index 37d37b820e..43ab5c6bc6 100644
--- a/xen/include/asm-x86/msi.h
+++ b/xen/include/asm-x86/msi.h
@@ -48,6 +48,7 @@ 
 #define MSI_ADDR_REDIRECTION_SHIFT  3
 #define MSI_ADDR_REDIRECTION_CPU    (0 << MSI_ADDR_REDIRECTION_SHIFT)
 #define MSI_ADDR_REDIRECTION_LOWPRI (1 << MSI_ADDR_REDIRECTION_SHIFT)
+#define MSI_ADDR_REDIRECTION_MASK   (1 << MSI_ADDR_REDIRECTION_SHIFT)
 
 #define MSI_ADDR_DEST_ID_SHIFT		12
 #define	 MSI_ADDR_DEST_ID_MASK		0x00ff000
diff --git a/xen/include/xen/hvm/irq.h b/xen/include/xen/hvm/irq.h
index 0d2c72c109..d07185a479 100644
--- a/xen/include/xen/hvm/irq.h
+++ b/xen/include/xen/hvm/irq.h
@@ -57,7 +57,9 @@  struct dev_intx_gsi_link {
 #define VMSI_DELIV_MASK   0x7000
 #define VMSI_TRIG_MODE    0x8000
 
+#define GFLAGS_SHIFT_DEST_ID        0
 #define GFLAGS_SHIFT_RH             8
+#define GFLAGS_SHIFT_DM             9
 #define GFLAGS_SHIFT_DELIV_MODE     12
 #define GFLAGS_SHIFT_TRG_MODE       15
 
diff --git a/xen/include/xen/irq.h b/xen/include/xen/irq.h
index 0aa817e266..9b10ffa4c3 100644
--- a/xen/include/xen/irq.h
+++ b/xen/include/xen/irq.h
@@ -133,6 +133,7 @@  struct pirq {
     struct arch_pirq arch;
 };
 
+#define INVALID_PIRQ -1
 #define pirq_info(d, p) ((struct pirq *)radix_tree_lookup(&(d)->pirq_tree, p))
 
 /* Use this instead of pirq_info() if the structure may need allocating. */
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index 3c6beaaf4a..21da73df16 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -13,6 +13,7 @@ 
  * of just returning whether the lock is hold by any CPU).
  */
 #define vpci_rlock(d) read_lock(&(d)->arch.hvm_domain.vpci_lock)
+#define vpci_tryrlock(d) read_trylock(&(d)->arch.hvm_domain.vpci_lock)
 #define vpci_wlock(d) write_lock(&(d)->arch.hvm_domain.vpci_lock)
 #define vpci_runlock(d) read_unlock(&(d)->arch.hvm_domain.vpci_lock)
 #define vpci_wunlock(d) write_unlock(&(d)->arch.hvm_domain.vpci_lock)
@@ -93,9 +94,35 @@  struct vpci {
         } bars[7]; /* At most 6 BARS + 1 expansion ROM BAR. */
         /* FIXME: currently there's no support for SR-IOV. */
     } header;
+
+    /* MSI data. */
+    struct vpci_msi {
+        /* Arch-specific data. */
+        struct vpci_arch_msi arch;
+        /* Address. */
+        uint64_t address;
+        /* Offset of the capability in the config space. */
+        unsigned int pos;
+        /* Maximum number of vectors supported by the device. */
+        unsigned int max_vectors;
+        /* Number of vectors configured. */
+        unsigned int vectors;
+        /* Mask bitfield. */
+        uint32_t mask;
+        /* Data. */
+        uint16_t data;
+        /* Enabled? */
+        bool enabled;
+        /* Supports per-vector masking? */
+        bool masking;
+        /* 64-bit address capable? */
+        bool address64;
+    } *msi;
 #endif
 };
 
+void vpci_dump_msi(void);
+
 #endif
 
 /*