Message ID | 1456771254-17511-35-git-send-email-armbru@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote: > ivshmem can be configured with and without interrupt capability > (a.k.a. "doorbell"). The two configurations have largely disjoint > options, which makes for a confusing (and badly checked) user > interface. Moreover, the device can't tell the guest whether its > doorbell is enabled. > > Create two new device models ivshmem-plain and ivshmem-doorbell, and > deprecate the old one. > > Changes from ivshmem: > > * PCI revision is 1 instead of 0. The new revision is fully backwards > compatible for guests. Guests may elect to require at least > revision 1 to make sure they're not exposed to the funny "no shared > memory, yet" state. > > * Property "role" replaced by "master". role=master becomes > master=on, role=peer becomes master=off. Default is off instead of > auto. > > * Property "use64" is gone. The new devices always have 64 bit BARs. > > Changes from ivshmem to ivshmem-plain: > > * The Interrupt Pin register in PCI config space is zero (does not use > an interrupt pin) instead of one (uses INTA). > > * Property "x-memdev" is renamed to "memdev". > > * Properties "shm" and "size" are gone. Use property "memdev" > instead. > > * Property "msi" is gone. The new device can't have MSI-X capability. > It can't interrupt anyway. > > * Properties "ioeventfd" and "vectors" are gone. They're meaningless > without interrupts anyway. > > Changes from ivshmem to ivshmem-doorbell: > > * Property "msi" is gone. The new device always has MSI-X capability. > > * Property "ioeventfd" defaults to on instead of off. > > * Property "size" is gone. The new device can only map all the shared > memory received from the server. > > Guests can easily find out whether the device is configured for > interrupts by checking for MSI-X capability. > > Note: some code added in sub-optimal places to make the diff easier to > review. The next commit will move it to more sensible places. > > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- This is a large patch that could perhaps be splitted, but I didn't see any issue with it so: Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> > docs/specs/ivshmem-spec.txt | 66 +++++----- > hw/misc/ivshmem.c | 310 ++++++++++++++++++++++++++++++++------------ > qemu-doc.texi | 33 ++--- > tests/ivshmem-test.c | 12 +- > 4 files changed, 288 insertions(+), 133 deletions(-) > > diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt > index 3eb8c97..047dfc0 100644 > --- a/docs/specs/ivshmem-spec.txt > +++ b/docs/specs/ivshmem-spec.txt > @@ -17,9 +17,10 @@ get interrupted by its peers. > > There are two basic configurations: > > -- Just shared memory: -device ivshmem,shm=NAME,... > +- Just shared memory: -device ivshmem-plain,memdev=HMB,... > > - This uses shared memory object NAME. > + This uses host memory backend HMB. It should have option "share" > + set. > > - Shared memory plus interrupts: -device ivshmem,chardev=CHR,vectors=N,... > > @@ -30,9 +31,8 @@ There are two basic configurations: > Each peer gets assigned a unique ID by the server. IDs must be > between 0 and 65535. > > - Interrupts are message-signaled by default (MSI-X). With msi=off > - the device has no MSI-X capability, and uses legacy INTx instead. > - vectors=N configures the number of vectors to use. > + Interrupts are message-signaled (MSI-X). vectors=N configures the > + number of vectors to use. > > For more details on ivshmem device properties, see The QEMU Emulator > User Documentation (qemu-doc.*). > @@ -40,14 +40,15 @@ User Documentation (qemu-doc.*). > > == The ivshmem PCI device's guest interface == > > -The device has vendor ID 1af4, device ID 1110, revision 0. > +The device has vendor ID 1af4, device ID 1110, revision 1. Before > +QEMU 2.6.0, it had revision 0. > > === PCI BARs === > > The ivshmem PCI device has two or three BARs: > > - BAR0 holds device registers (256 Byte MMIO) > -- BAR1 holds MSI-X table and PBA (only when using MSI-X) > +- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell) > - BAR2 maps the shared memory object > > There are two ways to use this device: > @@ -58,18 +59,19 @@ There are two ways to use this device: > user space (see http://dpdk.org/browse/memnic). > > - If you additionally need the capability for peers to interrupt each > - other, you need BAR0 and, if using MSI-X, BAR1. You will most > - likely want to write a kernel driver to handle interrupts. Requires > - the device to be configured for interrupts, obviously. > + other, you need BAR0 and BAR1. You will most likely want to write a > + kernel driver to handle interrupts. Requires the device to be > + configured for interrupts, obviously. > > Before QEMU 2.6.0, BAR2 can initially be invalid if the device is > configured for interrupts. It becomes safely accessible only after > -the ivshmem server provided the shared memory. Guest software should > -wait for the IVPosition register (described below) to become > -non-negative before accessing BAR2. > +the ivshmem server provided the shared memory. These devices have PCI > +revision 0 rather than 1. Guest software should wait for the > +IVPosition register (described below) to become non-negative before > +accessing BAR2. > > -The device is not capable to tell guest software whether it is > -configured for interrupts. > +Revision 0 of the device is not capable to tell guest software whether > +it is configured for interrupts. > > === PCI device registers === > > @@ -77,10 +79,12 @@ BAR 0 contains the following registers: > > Offset Size Access On reset Function > 0 4 read/write 0 Interrupt Mask > - bit 0: peer interrupt > + bit 0: peer interrupt (rev 0) > + reserved (rev 1) > bit 1..31: reserved > 4 4 read/write 0 Interrupt Status > - bit 0: peer interrupt > + bit 0: peer interrupt (rev 0) > + reserved (rev 1) > bit 1..31: reserved > 8 4 read-only 0 or ID IVPosition > 12 4 write-only N/A Doorbell > @@ -92,18 +96,18 @@ Software should only access the registers as specified in column > "Access". Reserved bits should be ignored on read, and preserved on > write. > > -Interrupt Status and Mask Register together control the legacy INTx > -interrupt when the device has no MSI-X capability: INTx is asserted > -when the bit-wise AND of Status and Mask is non-zero and the device > -has no MSI-X capability. Interrupt Status Register bit 0 becomes 1 > -when an interrupt request from a peer is received. Reading the > -register clears it. > +In revision 0 of the device, Interrupt Status and Mask Register > +together control the legacy INTx interrupt when the device has no > +MSI-X capability: INTx is asserted when the bit-wise AND of Status and > +Mask is non-zero and the device has no MSI-X capability. Interrupt > +Status Register bit 0 becomes 1 when an interrupt request from a peer > +is received. Reading the register clears it. > > IVPosition Register: if the device is not configured for interrupts, > this is zero. Else, it is the device's ID (between 0 and 65535). > > Before QEMU 2.6.0, the register may read -1 for a short while after > -reset. > +reset. These devices have PCI revision 0 rather than 1. > > There is no good way for software to find out whether the device is > configured for interrupts. A positive IVPosition means interrupts, > @@ -124,14 +128,14 @@ interrupt vectors connected, the write is ignored. The device is not > capable to tell guest software what peers are connected, or how many > interrupt vectors are connected. > > -If the peer doesn't use MSI-X, its Interrupt Status register is set to > -1. This asserts INTx unless masked by the Interrupt Mask register. > -The device is not capable to communicate the interrupt vector to guest > -software then. > +The peer's interrupt for this vector then becomes pending. There is > +no way for software to clear the pending bit, and a polling mode of > +operation is therefore impossible. > > -If the peer uses MSI-X, the interrupt for this vector becomes pending. > -There is no way for software to clear the pending bit, and a polling > -mode of operation is therefore impossible with MSI-X. > +If the peer is a revision 0 device without MSI-X capability, its > +Interrupt Status register is set to 1. This asserts INTx unless > +masked by the Interrupt Mask register. The device is not capable to > +communicate the interrupt vector to guest software then. > > With multiple MSI-X vectors, different vectors can be used to indicate > different events have occurred. The semantics of interrupt vectors > diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c > index b39ea27..f7f5b3b 100644 > --- a/hw/misc/ivshmem.c > +++ b/hw/misc/ivshmem.c > @@ -29,6 +29,7 @@ > #include "qom/object_interfaces.h" > #include "sysemu/char.h" > #include "sysemu/hostmem.h" > +#include "sysemu/qtest.h" > #include "qapi/visitor.h" > #include "exec/ram_addr.h" > > @@ -53,6 +54,18 @@ > } \ > } while (0) > > +#define TYPE_IVSHMEM_COMMON "ivshmem-common" > +#define IVSHMEM_COMMON(obj) \ > + OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_COMMON) > + > +#define TYPE_IVSHMEM_PLAIN "ivshmem-plain" > +#define IVSHMEM_PLAIN(obj) \ > + OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_PLAIN) > + > +#define TYPE_IVSHMEM_DOORBELL "ivshmem-doorbell" > +#define IVSHMEM_DOORBELL(obj) \ > + OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_DOORBELL) > + > #define TYPE_IVSHMEM "ivshmem" > #define IVSHMEM(obj) \ > OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM) > @@ -80,8 +93,6 @@ typedef struct IVShmemState { > MemoryRegion ivshmem_mmio; > > MemoryRegion *ivshmem_bar2; /* BAR 2 (shared memory) */ > - size_t ivshmem_size; /* size of shared memory region */ > - uint32_t ivshmem_64bit; > > Peer *peers; > int nb_peers; /* space in @peers[] */ > @@ -96,9 +107,12 @@ typedef struct IVShmemState { > OnOffAuto master; > Error *migration_blocker; > > - char * shmobj; > - char * sizearg; > - char * role; > + /* legacy cruft */ > + char *role; > + char *shmobj; > + char *sizearg; > + size_t legacy_size; > + uint32_t not_legacy_32bit; > } IVShmemState; > > /* registers for the Inter-VM shared memory device */ > @@ -258,7 +272,7 @@ static void ivshmem_vector_notify(void *opaque) > { > MSIVector *entry = opaque; > PCIDevice *pdev = entry->pdev; > - IVShmemState *s = IVSHMEM(pdev); > + IVShmemState *s = IVSHMEM_COMMON(pdev); > int vector = entry - s->msi_vectors; > EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; > > @@ -279,7 +293,7 @@ static void ivshmem_vector_notify(void *opaque) > static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector, > MSIMessage msg) > { > - IVShmemState *s = IVSHMEM(dev); > + IVShmemState *s = IVSHMEM_COMMON(dev); > EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; > MSIVector *v = &s->msi_vectors[vector]; > int ret; > @@ -296,7 +310,7 @@ static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector, > > static void ivshmem_vector_mask(PCIDevice *dev, unsigned vector) > { > - IVShmemState *s = IVSHMEM(dev); > + IVShmemState *s = IVSHMEM_COMMON(dev); > EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; > int ret; > > @@ -313,7 +327,7 @@ static void ivshmem_vector_poll(PCIDevice *dev, > unsigned int vector_start, > unsigned int vector_end) > { > - IVShmemState *s = IVSHMEM(dev); > + IVShmemState *s = IVSHMEM_COMMON(dev); > unsigned int vector; > > IVSHMEM_DPRINTF("vector poll %p %d-%d\n", dev, vector_start, vector_end); > @@ -460,6 +474,7 @@ static void setup_interrupt(IVShmemState *s, int vector, Error **errp) > static void process_msg_shmem(IVShmemState *s, int fd, Error **errp) > { > struct stat buf; > + size_t size; > void *ptr; > > if (s->ivshmem_bar2) { > @@ -475,15 +490,21 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp) > return; > } > > - if (s->ivshmem_size > buf.st_size) { > - error_setg(errp, "server sent only %zd bytes of shared memory", > - (size_t)buf.st_size); > - close(fd); > - return; > + size = buf.st_size; > + > + /* Legacy cruft */ > + if (s->legacy_size != SIZE_MAX) { > + if (size < s->legacy_size) { > + error_setg(errp, "server sent only %zd bytes of shared memory", > + (size_t)buf.st_size); > + close(fd); > + return; > + } > + size = s->legacy_size; > } > > /* mmap the region and map into the BAR2 */ > - ptr = mmap(0, s->ivshmem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > + ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); > if (ptr == MAP_FAILED) { > error_setg_errno(errp, errno, "Failed to mmap shared memory"); > close(fd); > @@ -491,7 +512,7 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp) > } > s->ivshmem_bar2 = g_new(MemoryRegion, 1); > memory_region_init_ram_ptr(s->ivshmem_bar2, OBJECT(s), > - "ivshmem.bar2", s->ivshmem_size, ptr); > + "ivshmem.bar2", size, ptr); > qemu_set_ram_fd(s->ivshmem_bar2->ram_addr, fd); > } > > @@ -700,7 +721,7 @@ static void ivshmem_vector_use(IVShmemState *s) > > static void ivshmem_reset(DeviceState *d) > { > - IVShmemState *s = IVSHMEM(d); > + IVShmemState *s = IVSHMEM_COMMON(d); > > s->intrstatus = 0; > s->intrmask = 0; > @@ -776,7 +797,7 @@ static void ivshmem_disable_irqfd(IVShmemState *s) > static void ivshmem_write_config(PCIDevice *pdev, uint32_t address, > uint32_t val, int len) > { > - IVShmemState *s = IVSHMEM(pdev); > + IVShmemState *s = IVSHMEM_COMMON(pdev); > int is_enabled, was_enabled = msix_enabled(pdev); > > pci_default_write_config(pdev, address, val, len); > @@ -818,42 +839,14 @@ static HostMemoryBackend *desugar_shm(const char *shm, size_t size) > return MEMORY_BACKEND(obj); > } > > -static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) > +static void ivshmem_common_realize(PCIDevice *dev, Error **errp) > { > - IVShmemState *s = IVSHMEM(dev); > + IVShmemState *s = IVSHMEM_COMMON(dev); > Error *err = NULL; > uint8_t *pci_conf; > uint8_t attr = PCI_BASE_ADDRESS_SPACE_MEMORY | > PCI_BASE_ADDRESS_MEM_PREFETCH; > > - if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) { > - error_setg(errp, > - "You must specify either 'shm', 'chardev' or 'x-memdev'"); > - return; > - } > - > - if (s->hostmem) { > - MemoryRegion *mr; > - > - if (s->sizearg) { > - g_warning("size argument ignored with hostmem"); > - } > - > - mr = host_memory_backend_get_memory(s->hostmem, &error_abort); > - s->ivshmem_size = memory_region_size(mr); > - } else if (s->sizearg == NULL) { > - s->ivshmem_size = 4 << 20; /* 4 MB default */ > - } else { > - char *end; > - int64_t size = qemu_strtosz(s->sizearg, &end); > - if (size < 0 || (size_t)size != size || *end != '\0' > - || !is_power_of_2(size)) { > - error_setg(errp, "Invalid size %s", s->sizearg); > - return; > - } > - s->ivshmem_size = size; > - } > - > /* IRQFD requires MSI */ > if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) && > !ivshmem_has_feature(s, IVSHMEM_MSI)) { > @@ -861,29 +854,9 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) > return; > } > > - /* check that role is reasonable */ > - if (s->role) { > - if (strncmp(s->role, "peer", 5) == 0) { > - s->master = ON_OFF_AUTO_OFF; > - } else if (strncmp(s->role, "master", 7) == 0) { > - s->master = ON_OFF_AUTO_ON; > - } else { > - error_setg(errp, "'role' must be 'peer' or 'master'"); > - return; > - } > - } else { > - s->master = ON_OFF_AUTO_AUTO; > - } > - > pci_conf = dev->config; > pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY; > > - /* > - * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a > - * bald-faced lie then. But it's a backwards compatible lie. > - */ > - pci_config_set_interrupt_pin(pci_conf, 1); > - > memory_region_init_io(&s->ivshmem_mmio, OBJECT(s), &ivshmem_mmio_ops, s, > "ivshmem-mmio", IVSHMEM_REG_BAR_SIZE); > > @@ -891,14 +864,10 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) > pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, > &s->ivshmem_mmio); > > - if (s->ivshmem_64bit) { > + if (!s->not_legacy_32bit) { > attr |= PCI_BASE_ADDRESS_MEM_TYPE_64; > } > > - if (s->shmobj) { > - s->hostmem = desugar_shm(s->shmobj, s->ivshmem_size); > - } > - > if (s->hostmem != NULL) { > IVSHMEM_DPRINTF("using hostmem\n"); > > @@ -945,9 +914,68 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) > } > } > > -static void pci_ivshmem_exit(PCIDevice *dev) > +static void ivshmem_realize(PCIDevice *dev, Error **errp) > { > - IVShmemState *s = IVSHMEM(dev); > + IVShmemState *s = IVSHMEM_COMMON(dev); > + > + if (!qtest_enabled()) { > + error_report("ivshmem is deprecated, please use ivshmem-plain" > + " or ivshmem-doorbell instead"); > + } > + > + if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) { > + error_setg(errp, > + "You must specify either 'shm', 'chardev' or 'x-memdev'"); > + return; > + } > + > + if (s->hostmem) { > + if (s->sizearg) { > + g_warning("size argument ignored with hostmem"); > + } > + } else if (s->sizearg == NULL) { > + s->legacy_size = 4 << 20; /* 4 MB default */ > + } else { > + char *end; > + int64_t size = qemu_strtosz(s->sizearg, &end); > + if (size < 0 || (size_t)size != size || *end != '\0' > + || !is_power_of_2(size)) { > + error_setg(errp, "Invalid size %s", s->sizearg); > + return; > + } > + s->legacy_size = size; > + } > + > + /* check that role is reasonable */ > + if (s->role) { > + if (strncmp(s->role, "peer", 5) == 0) { > + s->master = ON_OFF_AUTO_OFF; > + } else if (strncmp(s->role, "master", 7) == 0) { > + s->master = ON_OFF_AUTO_ON; > + } else { > + error_setg(errp, "'role' must be 'peer' or 'master'"); > + return; > + } > + } else { > + s->master = ON_OFF_AUTO_AUTO; > + } > + > + if (s->shmobj) { > + s->hostmem = desugar_shm(s->shmobj, s->legacy_size); > + } > + > + /* > + * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a > + * bald-faced lie then. But it's a backwards compatible lie. > + */ > + pci_config_set_interrupt_pin(dev->config, 1); > + > + ivshmem_common_realize(dev, errp); > +} > + > +static void ivshmem_exit(PCIDevice *dev) > +{ > + IVShmemState *s = IVSHMEM_COMMON(dev); > int i; > > if (s->migration_blocker) { > @@ -959,7 +987,7 @@ static void pci_ivshmem_exit(PCIDevice *dev) > if (!s->hostmem) { > void *addr = memory_region_get_ram_ptr(s->ivshmem_bar2); > > - if (munmap(addr, s->ivshmem_size) == -1) { > + if (munmap(addr, memory_region_size(s->ivshmem_bar2) == -1)) { > error_report("Failed to munmap shared memory %s", > strerror(errno)); > } > @@ -1074,28 +1102,39 @@ static Property ivshmem_properties[] = { > DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true), > DEFINE_PROP_STRING("shm", IVShmemState, shmobj), > DEFINE_PROP_STRING("role", IVShmemState, role), > - DEFINE_PROP_UINT32("use64", IVShmemState, ivshmem_64bit, 1), > + DEFINE_PROP_UINT32("use64", IVShmemState, not_legacy_32bit, 1), > DEFINE_PROP_END_OF_LIST(), > }; > > -static void ivshmem_class_init(ObjectClass *klass, void *data) > +static void ivshmem_common_class_init(ObjectClass *klass, void *data) > { > DeviceClass *dc = DEVICE_CLASS(klass); > PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); > > - k->realize = pci_ivshmem_realize; > - k->exit = pci_ivshmem_exit; > + k->realize = ivshmem_common_realize; > + k->exit = ivshmem_exit; > k->config_write = ivshmem_write_config; > k->vendor_id = PCI_VENDOR_ID_IVSHMEM; > k->device_id = PCI_DEVICE_ID_IVSHMEM; > k->class_id = PCI_CLASS_MEMORY_RAM; > + k->revision = 1; > dc->reset = ivshmem_reset; > - dc->props = ivshmem_properties; > - dc->vmsd = &ivshmem_vmsd; > set_bit(DEVICE_CATEGORY_MISC, dc->categories); > dc->desc = "Inter-VM shared memory"; > } > > +static void ivshmem_class_init(ObjectClass *klass, void *data) > +{ > + DeviceClass *dc = DEVICE_CLASS(klass); > + PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); > + > + k->realize = ivshmem_realize; > + k->revision = 0; > + dc->desc = "Inter-VM shared memory (legacy)"; > + dc->props = ivshmem_properties; > + dc->vmsd = &ivshmem_vmsd; > +} > + > static void ivshmem_check_memdev_is_busy(Object *obj, const char *name, > Object *val, Error **errp) > { > @@ -1122,16 +1161,121 @@ static void ivshmem_init(Object *obj) > &error_abort); > } > > +static const TypeInfo ivshmem_common_info = { > + .name = TYPE_IVSHMEM_COMMON, > + .parent = TYPE_PCI_DEVICE, > + .instance_size = sizeof(IVShmemState), > + .abstract = true, > + .class_init = ivshmem_common_class_init, > +}; > + > static const TypeInfo ivshmem_info = { > .name = TYPE_IVSHMEM, > - .parent = TYPE_PCI_DEVICE, > + .parent = TYPE_IVSHMEM_COMMON, > .instance_size = sizeof(IVShmemState), > .instance_init = ivshmem_init, > .class_init = ivshmem_class_init, > }; > > +static const VMStateDescription ivshmem_plain_vmsd = { > + .name = TYPE_IVSHMEM_PLAIN, > + .version_id = 0, > + .minimum_version_id = 0, > + .pre_load = ivshmem_pre_load, > + .post_load = ivshmem_post_load, > + .fields = (VMStateField[]) { > + VMSTATE_PCI_DEVICE(parent_obj, IVShmemState), > + VMSTATE_UINT32(intrstatus, IVShmemState), > + VMSTATE_UINT32(intrmask, IVShmemState), > + VMSTATE_END_OF_LIST() > + }, > +}; > + > +static Property ivshmem_plain_properties[] = { > + DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF), > + DEFINE_PROP_END_OF_LIST(), > +}; > + > +static void ivshmem_plain_init(Object *obj) > +{ > + IVShmemState *s = IVSHMEM_PLAIN(obj); > + > + object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND, > + (Object **)&s->hostmem, > + ivshmem_check_memdev_is_busy, > + OBJ_PROP_LINK_UNREF_ON_RELEASE, > + &error_abort); > +} > + > +static void ivshmem_plain_class_init(ObjectClass *klass, void *data) > +{ > + DeviceClass *dc = DEVICE_CLASS(klass); > + > + dc->props = ivshmem_plain_properties; > + dc->vmsd = &ivshmem_plain_vmsd; > +} > + > +static const TypeInfo ivshmem_plain_info = { > + .name = TYPE_IVSHMEM_PLAIN, > + .parent = TYPE_IVSHMEM_COMMON, > + .instance_size = sizeof(IVShmemState), > + .instance_init = ivshmem_plain_init, > + .class_init = ivshmem_plain_class_init, > +}; > + > +static const VMStateDescription ivshmem_doorbell_vmsd = { > + .name = TYPE_IVSHMEM_DOORBELL, > + .version_id = 0, > + .minimum_version_id = 0, > + .pre_load = ivshmem_pre_load, > + .post_load = ivshmem_post_load, > + .fields = (VMStateField[]) { > + VMSTATE_PCI_DEVICE(parent_obj, IVShmemState), > + VMSTATE_MSIX(parent_obj, IVShmemState), > + VMSTATE_UINT32(intrstatus, IVShmemState), > + VMSTATE_UINT32(intrmask, IVShmemState), > + VMSTATE_END_OF_LIST() > + }, > +}; > + > +static Property ivshmem_doorbell_properties[] = { > + DEFINE_PROP_CHR("chardev", IVShmemState, server_chr), > + DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 1), > + DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD, > + true), > + DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF), > + DEFINE_PROP_END_OF_LIST(), > +}; > + > +static void ivshmem_doorbell_init(Object *obj) > +{ > + IVShmemState *s = IVSHMEM_DOORBELL(obj); > + > + s->features |= (1 << IVSHMEM_MSI); > + s->legacy_size = SIZE_MAX; /* whatever the server sends */ > +} > + > +static void ivshmem_doorbell_class_init(ObjectClass *klass, void *data) > +{ > + DeviceClass *dc = DEVICE_CLASS(klass); > + > + dc->props = ivshmem_doorbell_properties; > + dc->vmsd = &ivshmem_doorbell_vmsd; > +} > + > +static const TypeInfo ivshmem_doorbell_info = { > + .name = TYPE_IVSHMEM_DOORBELL, > + .parent = TYPE_IVSHMEM_COMMON, > + .instance_size = sizeof(IVShmemState), > + .instance_init = ivshmem_doorbell_init, > + .class_init = ivshmem_doorbell_class_init, > +}; > + > static void ivshmem_register_types(void) > { > + type_register_static(&ivshmem_common_info); > + type_register_static(&ivshmem_plain_info); > + type_register_static(&ivshmem_doorbell_info); > type_register_static(&ivshmem_info); > } > > diff --git a/qemu-doc.texi b/qemu-doc.texi > index 8afbbcd..0dd01c7 100644 > --- a/qemu-doc.texi > +++ b/qemu-doc.texi > @@ -1262,13 +1262,18 @@ basic example. > > @subsection Inter-VM Shared Memory device > > -With KVM enabled on a Linux host, a shared memory device is available. Guests > -map a POSIX shared memory region into the guest as a PCI device that enables > -zero-copy communication to the application level of the guests. The basic > -syntax is: > +On Linux hosts, a shared memory device is available. The basic syntax > +is: > > @example > -qemu-system-i386 -device ivshmem,size=@var{size},shm=@var{shm-name} > +qemu-system-x86_64 -device ivshmem-plain,memdev=@var{hostmem} > +@end example > + > +where @var{hostmem} names a host memory backend. For a POSIX shared > +memory backend, use something like > + > +@example > +-object memory-backend-file,size=1M,share,mem-path=/dev/shm/ivshmem,id=@var{hostmem} > @end example > > If desired, interrupts can be sent between guest VMs accessing the same shared > @@ -1282,8 +1287,7 @@ memory server is: > ivshmem-server -p @var{pidfile} -S @var{path} -m @var{shm-name} -l @var{shm-size} -n @var{vectors} > > # Then start your qemu instances with matching arguments > -qemu-system-i386 -device ivshmem,size=@var{shm-size},vectors=@var{vectors},chardev=@var{id} > - [,msi=on][,ioeventfd=on][,role=peer|master] > +qemu-system-x86_64 -device ivshmem-doorbell,vectors=@var{vectors},chardev=@var{id} > -chardev socket,path=@var{path},id=@var{id} > @end example > > @@ -1291,12 +1295,11 @@ When using the server, the guest will be assigned a VM ID (>=0) that allows gues > using the same server to communicate via interrupts. Guests can read their > VM ID from a device register (see ivshmem-spec.txt). > > -The @option{role} argument can be set to either master or peer and will affect > -how the shared memory is migrated. With @option{role=master}, the guest will > -copy the shared memory on migration to the destination host. With > -@option{role=peer}, the guest will not be able to migrate with the device attached. > -With the @option{peer} case, the device should be detached and then reattached > -after migration using the PCI hotplug support. > +With device property @option{master=on}, the guest will copy the shared > +memory on migration to the destination host. With @option{master=off}, > +the guest will not be able to migrate with the device attached. In the > +latter case, the device should be detached and then reattached after > +migration using the PCI hotplug support. > > @subsubsection ivshmem and hugepages > > @@ -1304,8 +1307,8 @@ Instead of specifying the <shm size> using POSIX shm, you may specify > a memory backend that has hugepage support: > > @example > -qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1 > - -device ivshmem,x-memdev=mb1 > +qemu-system-x86_64 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1 > + -device ivshmem-plain,memdev=mb1 > @end example > > ivshmem-server also supports hugepages mount points with the > diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c > index 68d6840..891b6b8 100644 > --- a/tests/ivshmem-test.c > +++ b/tests/ivshmem-test.c > @@ -127,7 +127,9 @@ static void setup_vm_cmd(IVState *s, const char *cmd, bool msix) > > static void setup_vm(IVState *s) > { > - char *cmd = g_strdup_printf("-device ivshmem,shm=%s,size=1M", tmpshm); > + char *cmd = g_strdup_printf("-object memory-backend-file" > + ",id=mb1,size=1M,share,mem-path=/dev/shm%s" > + " -device ivshmem-plain,memdev=mb1", tmpshm); > > setup_vm_cmd(s, cmd, false); > > @@ -284,8 +286,10 @@ static void *server_thread(void *data) > static void setup_vm_with_server(IVState *s, int nvectors, bool msi) > { > char *cmd = g_strdup_printf("-chardev socket,id=chr0,path=%s,nowait " > - "-device ivshmem,size=1M,chardev=chr0,vectors=%d,msi=%s", > - tmpserver, nvectors, msi ? "true" : "false"); > + "-device ivshmem%s,chardev=chr0,vectors=%d", > + tmpserver, > + msi ? "-doorbell" : ",size=1M,msi=off", > + nvectors); > > setup_vm_cmd(s, cmd, msi); > > @@ -412,7 +416,7 @@ static void test_ivshmem_memdev(void) > > /* just for the sake of checking memory-backend property */ > setup_vm_cmd(&state, "-object memory-backend-ram,size=1M,id=mb1" > - " -device ivshmem,x-memdev=mb1", false); > + " -device ivshmem-plain,memdev=mb1", false); > > cleanup_vm(&state); > } > -- > 2.4.3 > >
diff --git a/docs/specs/ivshmem-spec.txt b/docs/specs/ivshmem-spec.txt index 3eb8c97..047dfc0 100644 --- a/docs/specs/ivshmem-spec.txt +++ b/docs/specs/ivshmem-spec.txt @@ -17,9 +17,10 @@ get interrupted by its peers. There are two basic configurations: -- Just shared memory: -device ivshmem,shm=NAME,... +- Just shared memory: -device ivshmem-plain,memdev=HMB,... - This uses shared memory object NAME. + This uses host memory backend HMB. It should have option "share" + set. - Shared memory plus interrupts: -device ivshmem,chardev=CHR,vectors=N,... @@ -30,9 +31,8 @@ There are two basic configurations: Each peer gets assigned a unique ID by the server. IDs must be between 0 and 65535. - Interrupts are message-signaled by default (MSI-X). With msi=off - the device has no MSI-X capability, and uses legacy INTx instead. - vectors=N configures the number of vectors to use. + Interrupts are message-signaled (MSI-X). vectors=N configures the + number of vectors to use. For more details on ivshmem device properties, see The QEMU Emulator User Documentation (qemu-doc.*). @@ -40,14 +40,15 @@ User Documentation (qemu-doc.*). == The ivshmem PCI device's guest interface == -The device has vendor ID 1af4, device ID 1110, revision 0. +The device has vendor ID 1af4, device ID 1110, revision 1. Before +QEMU 2.6.0, it had revision 0. === PCI BARs === The ivshmem PCI device has two or three BARs: - BAR0 holds device registers (256 Byte MMIO) -- BAR1 holds MSI-X table and PBA (only when using MSI-X) +- BAR1 holds MSI-X table and PBA (only ivshmem-doorbell) - BAR2 maps the shared memory object There are two ways to use this device: @@ -58,18 +59,19 @@ There are two ways to use this device: user space (see http://dpdk.org/browse/memnic). - If you additionally need the capability for peers to interrupt each - other, you need BAR0 and, if using MSI-X, BAR1. You will most - likely want to write a kernel driver to handle interrupts. Requires - the device to be configured for interrupts, obviously. + other, you need BAR0 and BAR1. You will most likely want to write a + kernel driver to handle interrupts. Requires the device to be + configured for interrupts, obviously. Before QEMU 2.6.0, BAR2 can initially be invalid if the device is configured for interrupts. It becomes safely accessible only after -the ivshmem server provided the shared memory. Guest software should -wait for the IVPosition register (described below) to become -non-negative before accessing BAR2. +the ivshmem server provided the shared memory. These devices have PCI +revision 0 rather than 1. Guest software should wait for the +IVPosition register (described below) to become non-negative before +accessing BAR2. -The device is not capable to tell guest software whether it is -configured for interrupts. +Revision 0 of the device is not capable to tell guest software whether +it is configured for interrupts. === PCI device registers === @@ -77,10 +79,12 @@ BAR 0 contains the following registers: Offset Size Access On reset Function 0 4 read/write 0 Interrupt Mask - bit 0: peer interrupt + bit 0: peer interrupt (rev 0) + reserved (rev 1) bit 1..31: reserved 4 4 read/write 0 Interrupt Status - bit 0: peer interrupt + bit 0: peer interrupt (rev 0) + reserved (rev 1) bit 1..31: reserved 8 4 read-only 0 or ID IVPosition 12 4 write-only N/A Doorbell @@ -92,18 +96,18 @@ Software should only access the registers as specified in column "Access". Reserved bits should be ignored on read, and preserved on write. -Interrupt Status and Mask Register together control the legacy INTx -interrupt when the device has no MSI-X capability: INTx is asserted -when the bit-wise AND of Status and Mask is non-zero and the device -has no MSI-X capability. Interrupt Status Register bit 0 becomes 1 -when an interrupt request from a peer is received. Reading the -register clears it. +In revision 0 of the device, Interrupt Status and Mask Register +together control the legacy INTx interrupt when the device has no +MSI-X capability: INTx is asserted when the bit-wise AND of Status and +Mask is non-zero and the device has no MSI-X capability. Interrupt +Status Register bit 0 becomes 1 when an interrupt request from a peer +is received. Reading the register clears it. IVPosition Register: if the device is not configured for interrupts, this is zero. Else, it is the device's ID (between 0 and 65535). Before QEMU 2.6.0, the register may read -1 for a short while after -reset. +reset. These devices have PCI revision 0 rather than 1. There is no good way for software to find out whether the device is configured for interrupts. A positive IVPosition means interrupts, @@ -124,14 +128,14 @@ interrupt vectors connected, the write is ignored. The device is not capable to tell guest software what peers are connected, or how many interrupt vectors are connected. -If the peer doesn't use MSI-X, its Interrupt Status register is set to -1. This asserts INTx unless masked by the Interrupt Mask register. -The device is not capable to communicate the interrupt vector to guest -software then. +The peer's interrupt for this vector then becomes pending. There is +no way for software to clear the pending bit, and a polling mode of +operation is therefore impossible. -If the peer uses MSI-X, the interrupt for this vector becomes pending. -There is no way for software to clear the pending bit, and a polling -mode of operation is therefore impossible with MSI-X. +If the peer is a revision 0 device without MSI-X capability, its +Interrupt Status register is set to 1. This asserts INTx unless +masked by the Interrupt Mask register. The device is not capable to +communicate the interrupt vector to guest software then. With multiple MSI-X vectors, different vectors can be used to indicate different events have occurred. The semantics of interrupt vectors diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c index b39ea27..f7f5b3b 100644 --- a/hw/misc/ivshmem.c +++ b/hw/misc/ivshmem.c @@ -29,6 +29,7 @@ #include "qom/object_interfaces.h" #include "sysemu/char.h" #include "sysemu/hostmem.h" +#include "sysemu/qtest.h" #include "qapi/visitor.h" #include "exec/ram_addr.h" @@ -53,6 +54,18 @@ } \ } while (0) +#define TYPE_IVSHMEM_COMMON "ivshmem-common" +#define IVSHMEM_COMMON(obj) \ + OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_COMMON) + +#define TYPE_IVSHMEM_PLAIN "ivshmem-plain" +#define IVSHMEM_PLAIN(obj) \ + OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_PLAIN) + +#define TYPE_IVSHMEM_DOORBELL "ivshmem-doorbell" +#define IVSHMEM_DOORBELL(obj) \ + OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM_DOORBELL) + #define TYPE_IVSHMEM "ivshmem" #define IVSHMEM(obj) \ OBJECT_CHECK(IVShmemState, (obj), TYPE_IVSHMEM) @@ -80,8 +93,6 @@ typedef struct IVShmemState { MemoryRegion ivshmem_mmio; MemoryRegion *ivshmem_bar2; /* BAR 2 (shared memory) */ - size_t ivshmem_size; /* size of shared memory region */ - uint32_t ivshmem_64bit; Peer *peers; int nb_peers; /* space in @peers[] */ @@ -96,9 +107,12 @@ typedef struct IVShmemState { OnOffAuto master; Error *migration_blocker; - char * shmobj; - char * sizearg; - char * role; + /* legacy cruft */ + char *role; + char *shmobj; + char *sizearg; + size_t legacy_size; + uint32_t not_legacy_32bit; } IVShmemState; /* registers for the Inter-VM shared memory device */ @@ -258,7 +272,7 @@ static void ivshmem_vector_notify(void *opaque) { MSIVector *entry = opaque; PCIDevice *pdev = entry->pdev; - IVShmemState *s = IVSHMEM(pdev); + IVShmemState *s = IVSHMEM_COMMON(pdev); int vector = entry - s->msi_vectors; EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; @@ -279,7 +293,7 @@ static void ivshmem_vector_notify(void *opaque) static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector, MSIMessage msg) { - IVShmemState *s = IVSHMEM(dev); + IVShmemState *s = IVSHMEM_COMMON(dev); EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; MSIVector *v = &s->msi_vectors[vector]; int ret; @@ -296,7 +310,7 @@ static int ivshmem_vector_unmask(PCIDevice *dev, unsigned vector, static void ivshmem_vector_mask(PCIDevice *dev, unsigned vector) { - IVShmemState *s = IVSHMEM(dev); + IVShmemState *s = IVSHMEM_COMMON(dev); EventNotifier *n = &s->peers[s->vm_id].eventfds[vector]; int ret; @@ -313,7 +327,7 @@ static void ivshmem_vector_poll(PCIDevice *dev, unsigned int vector_start, unsigned int vector_end) { - IVShmemState *s = IVSHMEM(dev); + IVShmemState *s = IVSHMEM_COMMON(dev); unsigned int vector; IVSHMEM_DPRINTF("vector poll %p %d-%d\n", dev, vector_start, vector_end); @@ -460,6 +474,7 @@ static void setup_interrupt(IVShmemState *s, int vector, Error **errp) static void process_msg_shmem(IVShmemState *s, int fd, Error **errp) { struct stat buf; + size_t size; void *ptr; if (s->ivshmem_bar2) { @@ -475,15 +490,21 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp) return; } - if (s->ivshmem_size > buf.st_size) { - error_setg(errp, "server sent only %zd bytes of shared memory", - (size_t)buf.st_size); - close(fd); - return; + size = buf.st_size; + + /* Legacy cruft */ + if (s->legacy_size != SIZE_MAX) { + if (size < s->legacy_size) { + error_setg(errp, "server sent only %zd bytes of shared memory", + (size_t)buf.st_size); + close(fd); + return; + } + size = s->legacy_size; } /* mmap the region and map into the BAR2 */ - ptr = mmap(0, s->ivshmem_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + ptr = mmap(0, size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); if (ptr == MAP_FAILED) { error_setg_errno(errp, errno, "Failed to mmap shared memory"); close(fd); @@ -491,7 +512,7 @@ static void process_msg_shmem(IVShmemState *s, int fd, Error **errp) } s->ivshmem_bar2 = g_new(MemoryRegion, 1); memory_region_init_ram_ptr(s->ivshmem_bar2, OBJECT(s), - "ivshmem.bar2", s->ivshmem_size, ptr); + "ivshmem.bar2", size, ptr); qemu_set_ram_fd(s->ivshmem_bar2->ram_addr, fd); } @@ -700,7 +721,7 @@ static void ivshmem_vector_use(IVShmemState *s) static void ivshmem_reset(DeviceState *d) { - IVShmemState *s = IVSHMEM(d); + IVShmemState *s = IVSHMEM_COMMON(d); s->intrstatus = 0; s->intrmask = 0; @@ -776,7 +797,7 @@ static void ivshmem_disable_irqfd(IVShmemState *s) static void ivshmem_write_config(PCIDevice *pdev, uint32_t address, uint32_t val, int len) { - IVShmemState *s = IVSHMEM(pdev); + IVShmemState *s = IVSHMEM_COMMON(pdev); int is_enabled, was_enabled = msix_enabled(pdev); pci_default_write_config(pdev, address, val, len); @@ -818,42 +839,14 @@ static HostMemoryBackend *desugar_shm(const char *shm, size_t size) return MEMORY_BACKEND(obj); } -static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) +static void ivshmem_common_realize(PCIDevice *dev, Error **errp) { - IVShmemState *s = IVSHMEM(dev); + IVShmemState *s = IVSHMEM_COMMON(dev); Error *err = NULL; uint8_t *pci_conf; uint8_t attr = PCI_BASE_ADDRESS_SPACE_MEMORY | PCI_BASE_ADDRESS_MEM_PREFETCH; - if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) { - error_setg(errp, - "You must specify either 'shm', 'chardev' or 'x-memdev'"); - return; - } - - if (s->hostmem) { - MemoryRegion *mr; - - if (s->sizearg) { - g_warning("size argument ignored with hostmem"); - } - - mr = host_memory_backend_get_memory(s->hostmem, &error_abort); - s->ivshmem_size = memory_region_size(mr); - } else if (s->sizearg == NULL) { - s->ivshmem_size = 4 << 20; /* 4 MB default */ - } else { - char *end; - int64_t size = qemu_strtosz(s->sizearg, &end); - if (size < 0 || (size_t)size != size || *end != '\0' - || !is_power_of_2(size)) { - error_setg(errp, "Invalid size %s", s->sizearg); - return; - } - s->ivshmem_size = size; - } - /* IRQFD requires MSI */ if (ivshmem_has_feature(s, IVSHMEM_IOEVENTFD) && !ivshmem_has_feature(s, IVSHMEM_MSI)) { @@ -861,29 +854,9 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) return; } - /* check that role is reasonable */ - if (s->role) { - if (strncmp(s->role, "peer", 5) == 0) { - s->master = ON_OFF_AUTO_OFF; - } else if (strncmp(s->role, "master", 7) == 0) { - s->master = ON_OFF_AUTO_ON; - } else { - error_setg(errp, "'role' must be 'peer' or 'master'"); - return; - } - } else { - s->master = ON_OFF_AUTO_AUTO; - } - pci_conf = dev->config; pci_conf[PCI_COMMAND] = PCI_COMMAND_IO | PCI_COMMAND_MEMORY; - /* - * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a - * bald-faced lie then. But it's a backwards compatible lie. - */ - pci_config_set_interrupt_pin(pci_conf, 1); - memory_region_init_io(&s->ivshmem_mmio, OBJECT(s), &ivshmem_mmio_ops, s, "ivshmem-mmio", IVSHMEM_REG_BAR_SIZE); @@ -891,14 +864,10 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) pci_register_bar(dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->ivshmem_mmio); - if (s->ivshmem_64bit) { + if (!s->not_legacy_32bit) { attr |= PCI_BASE_ADDRESS_MEM_TYPE_64; } - if (s->shmobj) { - s->hostmem = desugar_shm(s->shmobj, s->ivshmem_size); - } - if (s->hostmem != NULL) { IVSHMEM_DPRINTF("using hostmem\n"); @@ -945,9 +914,68 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) } } -static void pci_ivshmem_exit(PCIDevice *dev) +static void ivshmem_realize(PCIDevice *dev, Error **errp) { - IVShmemState *s = IVSHMEM(dev); + IVShmemState *s = IVSHMEM_COMMON(dev); + + if (!qtest_enabled()) { + error_report("ivshmem is deprecated, please use ivshmem-plain" + " or ivshmem-doorbell instead"); + } + + if (!!s->server_chr + !!s->shmobj + !!s->hostmem != 1) { + error_setg(errp, + "You must specify either 'shm', 'chardev' or 'x-memdev'"); + return; + } + + if (s->hostmem) { + if (s->sizearg) { + g_warning("size argument ignored with hostmem"); + } + } else if (s->sizearg == NULL) { + s->legacy_size = 4 << 20; /* 4 MB default */ + } else { + char *end; + int64_t size = qemu_strtosz(s->sizearg, &end); + if (size < 0 || (size_t)size != size || *end != '\0' + || !is_power_of_2(size)) { + error_setg(errp, "Invalid size %s", s->sizearg); + return; + } + s->legacy_size = size; + } + + /* check that role is reasonable */ + if (s->role) { + if (strncmp(s->role, "peer", 5) == 0) { + s->master = ON_OFF_AUTO_OFF; + } else if (strncmp(s->role, "master", 7) == 0) { + s->master = ON_OFF_AUTO_ON; + } else { + error_setg(errp, "'role' must be 'peer' or 'master'"); + return; + } + } else { + s->master = ON_OFF_AUTO_AUTO; + } + + if (s->shmobj) { + s->hostmem = desugar_shm(s->shmobj, s->legacy_size); + } + + /* + * Note: we don't use INTx with IVSHMEM_MSI at all, so this is a + * bald-faced lie then. But it's a backwards compatible lie. + */ + pci_config_set_interrupt_pin(dev->config, 1); + + ivshmem_common_realize(dev, errp); +} + +static void ivshmem_exit(PCIDevice *dev) +{ + IVShmemState *s = IVSHMEM_COMMON(dev); int i; if (s->migration_blocker) { @@ -959,7 +987,7 @@ static void pci_ivshmem_exit(PCIDevice *dev) if (!s->hostmem) { void *addr = memory_region_get_ram_ptr(s->ivshmem_bar2); - if (munmap(addr, s->ivshmem_size) == -1) { + if (munmap(addr, memory_region_size(s->ivshmem_bar2) == -1)) { error_report("Failed to munmap shared memory %s", strerror(errno)); } @@ -1074,28 +1102,39 @@ static Property ivshmem_properties[] = { DEFINE_PROP_BIT("msi", IVShmemState, features, IVSHMEM_MSI, true), DEFINE_PROP_STRING("shm", IVShmemState, shmobj), DEFINE_PROP_STRING("role", IVShmemState, role), - DEFINE_PROP_UINT32("use64", IVShmemState, ivshmem_64bit, 1), + DEFINE_PROP_UINT32("use64", IVShmemState, not_legacy_32bit, 1), DEFINE_PROP_END_OF_LIST(), }; -static void ivshmem_class_init(ObjectClass *klass, void *data) +static void ivshmem_common_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); - k->realize = pci_ivshmem_realize; - k->exit = pci_ivshmem_exit; + k->realize = ivshmem_common_realize; + k->exit = ivshmem_exit; k->config_write = ivshmem_write_config; k->vendor_id = PCI_VENDOR_ID_IVSHMEM; k->device_id = PCI_DEVICE_ID_IVSHMEM; k->class_id = PCI_CLASS_MEMORY_RAM; + k->revision = 1; dc->reset = ivshmem_reset; - dc->props = ivshmem_properties; - dc->vmsd = &ivshmem_vmsd; set_bit(DEVICE_CATEGORY_MISC, dc->categories); dc->desc = "Inter-VM shared memory"; } +static void ivshmem_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); + + k->realize = ivshmem_realize; + k->revision = 0; + dc->desc = "Inter-VM shared memory (legacy)"; + dc->props = ivshmem_properties; + dc->vmsd = &ivshmem_vmsd; +} + static void ivshmem_check_memdev_is_busy(Object *obj, const char *name, Object *val, Error **errp) { @@ -1122,16 +1161,121 @@ static void ivshmem_init(Object *obj) &error_abort); } +static const TypeInfo ivshmem_common_info = { + .name = TYPE_IVSHMEM_COMMON, + .parent = TYPE_PCI_DEVICE, + .instance_size = sizeof(IVShmemState), + .abstract = true, + .class_init = ivshmem_common_class_init, +}; + static const TypeInfo ivshmem_info = { .name = TYPE_IVSHMEM, - .parent = TYPE_PCI_DEVICE, + .parent = TYPE_IVSHMEM_COMMON, .instance_size = sizeof(IVShmemState), .instance_init = ivshmem_init, .class_init = ivshmem_class_init, }; +static const VMStateDescription ivshmem_plain_vmsd = { + .name = TYPE_IVSHMEM_PLAIN, + .version_id = 0, + .minimum_version_id = 0, + .pre_load = ivshmem_pre_load, + .post_load = ivshmem_post_load, + .fields = (VMStateField[]) { + VMSTATE_PCI_DEVICE(parent_obj, IVShmemState), + VMSTATE_UINT32(intrstatus, IVShmemState), + VMSTATE_UINT32(intrmask, IVShmemState), + VMSTATE_END_OF_LIST() + }, +}; + +static Property ivshmem_plain_properties[] = { + DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF), + DEFINE_PROP_END_OF_LIST(), +}; + +static void ivshmem_plain_init(Object *obj) +{ + IVShmemState *s = IVSHMEM_PLAIN(obj); + + object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND, + (Object **)&s->hostmem, + ivshmem_check_memdev_is_busy, + OBJ_PROP_LINK_UNREF_ON_RELEASE, + &error_abort); +} + +static void ivshmem_plain_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + + dc->props = ivshmem_plain_properties; + dc->vmsd = &ivshmem_plain_vmsd; +} + +static const TypeInfo ivshmem_plain_info = { + .name = TYPE_IVSHMEM_PLAIN, + .parent = TYPE_IVSHMEM_COMMON, + .instance_size = sizeof(IVShmemState), + .instance_init = ivshmem_plain_init, + .class_init = ivshmem_plain_class_init, +}; + +static const VMStateDescription ivshmem_doorbell_vmsd = { + .name = TYPE_IVSHMEM_DOORBELL, + .version_id = 0, + .minimum_version_id = 0, + .pre_load = ivshmem_pre_load, + .post_load = ivshmem_post_load, + .fields = (VMStateField[]) { + VMSTATE_PCI_DEVICE(parent_obj, IVShmemState), + VMSTATE_MSIX(parent_obj, IVShmemState), + VMSTATE_UINT32(intrstatus, IVShmemState), + VMSTATE_UINT32(intrmask, IVShmemState), + VMSTATE_END_OF_LIST() + }, +}; + +static Property ivshmem_doorbell_properties[] = { + DEFINE_PROP_CHR("chardev", IVShmemState, server_chr), + DEFINE_PROP_UINT32("vectors", IVShmemState, vectors, 1), + DEFINE_PROP_BIT("ioeventfd", IVShmemState, features, IVSHMEM_IOEVENTFD, + true), + DEFINE_PROP_ON_OFF_AUTO("master", IVShmemState, master, ON_OFF_AUTO_OFF), + DEFINE_PROP_END_OF_LIST(), +}; + +static void ivshmem_doorbell_init(Object *obj) +{ + IVShmemState *s = IVSHMEM_DOORBELL(obj); + + s->features |= (1 << IVSHMEM_MSI); + s->legacy_size = SIZE_MAX; /* whatever the server sends */ +} + +static void ivshmem_doorbell_class_init(ObjectClass *klass, void *data) +{ + DeviceClass *dc = DEVICE_CLASS(klass); + + dc->props = ivshmem_doorbell_properties; + dc->vmsd = &ivshmem_doorbell_vmsd; +} + +static const TypeInfo ivshmem_doorbell_info = { + .name = TYPE_IVSHMEM_DOORBELL, + .parent = TYPE_IVSHMEM_COMMON, + .instance_size = sizeof(IVShmemState), + .instance_init = ivshmem_doorbell_init, + .class_init = ivshmem_doorbell_class_init, +}; + static void ivshmem_register_types(void) { + type_register_static(&ivshmem_common_info); + type_register_static(&ivshmem_plain_info); + type_register_static(&ivshmem_doorbell_info); type_register_static(&ivshmem_info); } diff --git a/qemu-doc.texi b/qemu-doc.texi index 8afbbcd..0dd01c7 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -1262,13 +1262,18 @@ basic example. @subsection Inter-VM Shared Memory device -With KVM enabled on a Linux host, a shared memory device is available. Guests -map a POSIX shared memory region into the guest as a PCI device that enables -zero-copy communication to the application level of the guests. The basic -syntax is: +On Linux hosts, a shared memory device is available. The basic syntax +is: @example -qemu-system-i386 -device ivshmem,size=@var{size},shm=@var{shm-name} +qemu-system-x86_64 -device ivshmem-plain,memdev=@var{hostmem} +@end example + +where @var{hostmem} names a host memory backend. For a POSIX shared +memory backend, use something like + +@example +-object memory-backend-file,size=1M,share,mem-path=/dev/shm/ivshmem,id=@var{hostmem} @end example If desired, interrupts can be sent between guest VMs accessing the same shared @@ -1282,8 +1287,7 @@ memory server is: ivshmem-server -p @var{pidfile} -S @var{path} -m @var{shm-name} -l @var{shm-size} -n @var{vectors} # Then start your qemu instances with matching arguments -qemu-system-i386 -device ivshmem,size=@var{shm-size},vectors=@var{vectors},chardev=@var{id} - [,msi=on][,ioeventfd=on][,role=peer|master] +qemu-system-x86_64 -device ivshmem-doorbell,vectors=@var{vectors},chardev=@var{id} -chardev socket,path=@var{path},id=@var{id} @end example @@ -1291,12 +1295,11 @@ When using the server, the guest will be assigned a VM ID (>=0) that allows gues using the same server to communicate via interrupts. Guests can read their VM ID from a device register (see ivshmem-spec.txt). -The @option{role} argument can be set to either master or peer and will affect -how the shared memory is migrated. With @option{role=master}, the guest will -copy the shared memory on migration to the destination host. With -@option{role=peer}, the guest will not be able to migrate with the device attached. -With the @option{peer} case, the device should be detached and then reattached -after migration using the PCI hotplug support. +With device property @option{master=on}, the guest will copy the shared +memory on migration to the destination host. With @option{master=off}, +the guest will not be able to migrate with the device attached. In the +latter case, the device should be detached and then reattached after +migration using the PCI hotplug support. @subsubsection ivshmem and hugepages @@ -1304,8 +1307,8 @@ Instead of specifying the <shm size> using POSIX shm, you may specify a memory backend that has hugepage support: @example -qemu-system-i386 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1 - -device ivshmem,x-memdev=mb1 +qemu-system-x86_64 -object memory-backend-file,size=1G,mem-path=/dev/hugepages/my-shmem-file,share,id=mb1 + -device ivshmem-plain,memdev=mb1 @end example ivshmem-server also supports hugepages mount points with the diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c index 68d6840..891b6b8 100644 --- a/tests/ivshmem-test.c +++ b/tests/ivshmem-test.c @@ -127,7 +127,9 @@ static void setup_vm_cmd(IVState *s, const char *cmd, bool msix) static void setup_vm(IVState *s) { - char *cmd = g_strdup_printf("-device ivshmem,shm=%s,size=1M", tmpshm); + char *cmd = g_strdup_printf("-object memory-backend-file" + ",id=mb1,size=1M,share,mem-path=/dev/shm%s" + " -device ivshmem-plain,memdev=mb1", tmpshm); setup_vm_cmd(s, cmd, false); @@ -284,8 +286,10 @@ static void *server_thread(void *data) static void setup_vm_with_server(IVState *s, int nvectors, bool msi) { char *cmd = g_strdup_printf("-chardev socket,id=chr0,path=%s,nowait " - "-device ivshmem,size=1M,chardev=chr0,vectors=%d,msi=%s", - tmpserver, nvectors, msi ? "true" : "false"); + "-device ivshmem%s,chardev=chr0,vectors=%d", + tmpserver, + msi ? "-doorbell" : ",size=1M,msi=off", + nvectors); setup_vm_cmd(s, cmd, msi); @@ -412,7 +416,7 @@ static void test_ivshmem_memdev(void) /* just for the sake of checking memory-backend property */ setup_vm_cmd(&state, "-object memory-backend-ram,size=1M,id=mb1" - " -device ivshmem,x-memdev=mb1", false); + " -device ivshmem-plain,memdev=mb1", false); cleanup_vm(&state); }
ivshmem can be configured with and without interrupt capability (a.k.a. "doorbell"). The two configurations have largely disjoint options, which makes for a confusing (and badly checked) user interface. Moreover, the device can't tell the guest whether its doorbell is enabled. Create two new device models ivshmem-plain and ivshmem-doorbell, and deprecate the old one. Changes from ivshmem: * PCI revision is 1 instead of 0. The new revision is fully backwards compatible for guests. Guests may elect to require at least revision 1 to make sure they're not exposed to the funny "no shared memory, yet" state. * Property "role" replaced by "master". role=master becomes master=on, role=peer becomes master=off. Default is off instead of auto. * Property "use64" is gone. The new devices always have 64 bit BARs. Changes from ivshmem to ivshmem-plain: * The Interrupt Pin register in PCI config space is zero (does not use an interrupt pin) instead of one (uses INTA). * Property "x-memdev" is renamed to "memdev". * Properties "shm" and "size" are gone. Use property "memdev" instead. * Property "msi" is gone. The new device can't have MSI-X capability. It can't interrupt anyway. * Properties "ioeventfd" and "vectors" are gone. They're meaningless without interrupts anyway. Changes from ivshmem to ivshmem-doorbell: * Property "msi" is gone. The new device always has MSI-X capability. * Property "ioeventfd" defaults to on instead of off. * Property "size" is gone. The new device can only map all the shared memory received from the server. Guests can easily find out whether the device is configured for interrupts by checking for MSI-X capability. Note: some code added in sub-optimal places to make the diff easier to review. The next commit will move it to more sensible places. Signed-off-by: Markus Armbruster <armbru@redhat.com> --- docs/specs/ivshmem-spec.txt | 66 +++++----- hw/misc/ivshmem.c | 310 ++++++++++++++++++++++++++++++++------------ qemu-doc.texi | 33 ++--- tests/ivshmem-test.c | 12 +- 4 files changed, 288 insertions(+), 133 deletions(-)