Message ID | 1456771254-17511-24-git-send-email-armbru@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote: > When configured for interrupts (property "chardev" given), we receive > the shared memory from an ivshmem server. We do so asynchronously > after realize() completes, by setting up callbacks with > qemu_chr_add_handlers(). > > Keeping server I/O out of realize() that way avoids delays due to a > slow server. This is probably relevant only for hot plug. > > However, this funny "no shared memory, yet" state of the device also > causes a raft of issues that are hard or impossible to work around: > > * The guest is exposed to this state: when we enter and leave it its > shared memory contents is apruptly replaced, and device register > IVPosition changes. > > This is a known issue. We document that guests should not access > the shared memory after device initialization until the IVPosition > register becomes non-negative. > > For cold plug, the funny state is unlikely to be visible in > practice, because we normally receive the shared memory long before > the guest gets around to mess with the device. > > For hot plug, the timing is tighter, but the relative slowness of > PCI device configuration has a good chance to hide the funny state. > > In either case, guests complying with the documented procedure are > safe. > > * Migration becomes racy. > > If migration completes before the shared memory setup completes on > the source, shared memory contents is silently lost. Fortunately, > migration is rather unlikely to win this race. > > If the shared memory's ramblock arrives at the destination before > shared memory setup completes, migration fails. > > There is no known way for a management application to wait for > shared memory setup to complete. > > All you can do is retry failed migration. You can improve your > chances by leaving more time between running the destination QEMU > and the migrate command. > > To mitigate silent memory loss, you need to ensure the server > initializes shared memory exactly the same on source and > destination. > > These issues are entirely undocumented so far. > > I'd expect the server to be almost always fast enough to hide these > issues. But then rare catastrophic races are in a way the worst kind. > > This is way more trouble than I'm willing to take from any device. > Kill the funny state by receiving shared memory synchronously in > realize(). If your hot plug hangs, go kill your ivshmem server. > > For easier review, this commit only makes the receive synchronous, it > doesn't add the necessary error propagation. Without that, the funny > state persists. The next commit will do that, and kill it off for > real. > > Signed-off-by: Markus Armbruster <armbru@redhat.com> > --- > hw/misc/ivshmem.c | 70 +++++++++++++++++++++++++++++++++++++--------------- > tests/ivshmem-test.c | 26 ++++++------------- > 2 files changed, 57 insertions(+), 39 deletions(-) > > diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c > index c366087..352937f 100644 > --- a/hw/misc/ivshmem.c > +++ b/hw/misc/ivshmem.c > @@ -676,27 +676,47 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size) > process_msg(s, incoming_posn, incoming_fd); > } > > -static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size) > +static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd) > { > - IVShmemState *s = opaque; > - int tmp; > - int64_t version; > + int64_t msg; > + int n, ret; > > - if (!fifo_update_and_get_i64(s, buf, size, &version)) { > - return; > - } > + n = 0; > + do { > + ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *)&msg + n, > + sizeof(msg) - n); > + if (ret < 0 && ret != -EINTR) { > + /* TODO error handling */ > + return INT64_MIN; > + } > + n += ret; > + } while (n < sizeof(msg)); > > - tmp = qemu_chr_fe_get_msgfd(s->server_chr); > - if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) { > + *pfd = qemu_chr_fe_get_msgfd(s->server_chr); > + return msg; > +} > + > +static void ivshmem_recv_setup(IVShmemState *s) > +{ > + int64_t msg; > + int fd; > + > + msg = ivshmem_recv_msg(s, &fd); > + if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) { > fprintf(stderr, "incompatible version, you are connecting to a ivshmem-" > "server using a different protocol please check your setup\n"); > - qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s); > return; > } > > - IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n"); > - qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read, > - NULL, s); > + /* > + * Receive more messages until we got shared memory. > + */ > + do { > + msg = ivshmem_recv_msg(s, &fd); > + process_msg(s, msg, fd); > + } while (msg != -1); > + > + assert(memory_region_is_mapped(&s->ivshmem)); It is easy to trigger at runtime, I suggest to report an error instead. looks good otherwise > } > > /* Select the MSI-X vectors used by device. > @@ -903,19 +923,29 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) > IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n", > s->server_chr->filename); > > - if (ivshmem_setup_interrupts(s) < 0) { > - error_setg(errp, "failed to initialize interrupts"); > - return; > - } > - > /* we allocate enough space for 16 peers and grow as needed */ > resize_peers(s, 16); > s->vm_id = -1; > > pci_register_bar(dev, 2, attr, &s->bar); > > - qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, > - ivshmem_check_version, NULL, s); > + /* > + * Receive setup messages from server synchronously. > + * Older versions did it asynchronously, but that creates a > + * number of entertaining race conditions. > + * TODO Propagate errors! Without that, we still have races > + * on errors. > + */ > + ivshmem_recv_setup(s); > + if (memory_region_is_mapped(&s->ivshmem)) { > + qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, > + ivshmem_read, NULL, s); > + } > + > + if (ivshmem_setup_interrupts(s) < 0) { > + error_setg(errp, "failed to initialize interrupts"); > + return; > + } > } else { > /* just map the file immediately, we're not using a server */ > int fd; > diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c > index c1dd7bb..68d6840 100644 > --- a/tests/ivshmem-test.c > +++ b/tests/ivshmem-test.c > @@ -309,35 +309,23 @@ static void test_ivshmem_server(bool msi) > ret = ivshmem_server_start(&server); > g_assert_cmpint(ret, ==, 0); > > - setup_vm_with_server(&state1, nvectors, msi); > - s1 = &state1; > - setup_vm_with_server(&state2, nvectors, msi); > - s2 = &state2; > - > - /* check state before server sends stuff */ > - g_assert_cmpuint(in_reg(s1, IVPOSITION), ==, 0xffffffff); > - g_assert_cmpuint(in_reg(s2, IVPOSITION), ==, 0xffffffff); > - g_assert_cmpuint(qtest_readb(s1->qtest, (uintptr_t)s1->mem_base), ==, 0x00); > - > thread.server = &server; > ret = pipe(thread.pipe); > g_assert_cmpint(ret, ==, 0); > thread.thread = g_thread_new("ivshmem-server", server_thread, &thread); > g_assert(thread.thread != NULL); > > - /* waiting for devices to become operational */ > - while (g_get_monotonic_time() < end_time) { > - g_usleep(1000); > - if ((int)in_reg(s1, IVPOSITION) >= 0 && > - (int)in_reg(s2, IVPOSITION) >= 0) { > - break; > - } > - } > + setup_vm_with_server(&state1, nvectors, msi); > + s1 = &state1; > + setup_vm_with_server(&state2, nvectors, msi); > + s2 = &state2; > > /* check got different VM ids */ > vm1 = in_reg(s1, IVPOSITION); > vm2 = in_reg(s2, IVPOSITION); > - g_assert_cmpuint(vm1, !=, vm2); > + g_assert_cmpint(vm1, >=, 0); > + g_assert_cmpint(vm2, >=, 0); > + g_assert_cmpint(vm1, !=, vm2); > > /* check number of MSI-X vectors */ > global_qtest = s1->qtest; > -- > 2.4.3 > >
Marc-André Lureau <marcandre.lureau@gmail.com> writes: > Hi > > On Mon, Feb 29, 2016 at 7:40 PM, Markus Armbruster <armbru@redhat.com> wrote: >> When configured for interrupts (property "chardev" given), we receive >> the shared memory from an ivshmem server. We do so asynchronously >> after realize() completes, by setting up callbacks with >> qemu_chr_add_handlers(). >> >> Keeping server I/O out of realize() that way avoids delays due to a >> slow server. This is probably relevant only for hot plug. >> >> However, this funny "no shared memory, yet" state of the device also >> causes a raft of issues that are hard or impossible to work around: >> >> * The guest is exposed to this state: when we enter and leave it its >> shared memory contents is apruptly replaced, and device register >> IVPosition changes. >> >> This is a known issue. We document that guests should not access >> the shared memory after device initialization until the IVPosition >> register becomes non-negative. >> >> For cold plug, the funny state is unlikely to be visible in >> practice, because we normally receive the shared memory long before >> the guest gets around to mess with the device. >> >> For hot plug, the timing is tighter, but the relative slowness of >> PCI device configuration has a good chance to hide the funny state. >> >> In either case, guests complying with the documented procedure are >> safe. >> >> * Migration becomes racy. >> >> If migration completes before the shared memory setup completes on >> the source, shared memory contents is silently lost. Fortunately, >> migration is rather unlikely to win this race. >> >> If the shared memory's ramblock arrives at the destination before >> shared memory setup completes, migration fails. >> >> There is no known way for a management application to wait for >> shared memory setup to complete. >> >> All you can do is retry failed migration. You can improve your >> chances by leaving more time between running the destination QEMU >> and the migrate command. >> >> To mitigate silent memory loss, you need to ensure the server >> initializes shared memory exactly the same on source and >> destination. >> >> These issues are entirely undocumented so far. >> >> I'd expect the server to be almost always fast enough to hide these >> issues. But then rare catastrophic races are in a way the worst kind. >> >> This is way more trouble than I'm willing to take from any device. >> Kill the funny state by receiving shared memory synchronously in >> realize(). If your hot plug hangs, go kill your ivshmem server. >> >> For easier review, this commit only makes the receive synchronous, it >> doesn't add the necessary error propagation. Without that, the funny >> state persists. The next commit will do that, and kill it off for >> real. >> >> Signed-off-by: Markus Armbruster <armbru@redhat.com> >> --- >> hw/misc/ivshmem.c | 70 +++++++++++++++++++++++++++++++++++++--------------- >> tests/ivshmem-test.c | 26 ++++++------------- >> 2 files changed, 57 insertions(+), 39 deletions(-) >> >> diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c >> index c366087..352937f 100644 >> --- a/hw/misc/ivshmem.c >> +++ b/hw/misc/ivshmem.c >> @@ -676,27 +676,47 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size) >> process_msg(s, incoming_posn, incoming_fd); >> } >> >> -static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size) >> +static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd) >> { >> - IVShmemState *s = opaque; >> - int tmp; >> - int64_t version; >> + int64_t msg; >> + int n, ret; >> >> - if (!fifo_update_and_get_i64(s, buf, size, &version)) { >> - return; >> - } >> + n = 0; >> + do { >> + ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *)&msg + n, >> + sizeof(msg) - n); >> + if (ret < 0 && ret != -EINTR) { >> + /* TODO error handling */ >> + return INT64_MIN; >> + } >> + n += ret; >> + } while (n < sizeof(msg)); >> >> - tmp = qemu_chr_fe_get_msgfd(s->server_chr); >> - if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) { >> + *pfd = qemu_chr_fe_get_msgfd(s->server_chr); >> + return msg; >> +} >> + >> +static void ivshmem_recv_setup(IVShmemState *s) >> +{ >> + int64_t msg; >> + int fd; >> + >> + msg = ivshmem_recv_msg(s, &fd); >> + if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) { >> fprintf(stderr, "incompatible version, you are connecting to a ivshmem-" >> "server using a different protocol please check your setup\n"); >> - qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s); >> return; >> } >> >> - IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n"); >> - qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read, >> - NULL, s); >> + /* >> + * Receive more messages until we got shared memory. >> + */ >> + do { >> + msg = ivshmem_recv_msg(s, &fd); >> + process_msg(s, msg, fd); >> + } while (msg != -1); >> + >> + assert(memory_region_is_mapped(&s->ivshmem)); > > It is easy to trigger at runtime, I suggest to report an error instead. On successful return, the shared memory must be mapped. To ensure that, the code receives messages until it is. The assertion double-checks the code got it right. However, you're right in that in this half-finished state, the assertion is problematic: since we don't yet propagate errors, we take the success path even when process_msg_shmem() failed. I guess I'll have to turn the assertion into a comment until the next patch. > looks good otherwise Thanks!
diff --git a/hw/misc/ivshmem.c b/hw/misc/ivshmem.c index c366087..352937f 100644 --- a/hw/misc/ivshmem.c +++ b/hw/misc/ivshmem.c @@ -676,27 +676,47 @@ static void ivshmem_read(void *opaque, const uint8_t *buf, int size) process_msg(s, incoming_posn, incoming_fd); } -static void ivshmem_check_version(void *opaque, const uint8_t * buf, int size) +static int64_t ivshmem_recv_msg(IVShmemState *s, int *pfd) { - IVShmemState *s = opaque; - int tmp; - int64_t version; + int64_t msg; + int n, ret; - if (!fifo_update_and_get_i64(s, buf, size, &version)) { - return; - } + n = 0; + do { + ret = qemu_chr_fe_read_all(s->server_chr, (uint8_t *)&msg + n, + sizeof(msg) - n); + if (ret < 0 && ret != -EINTR) { + /* TODO error handling */ + return INT64_MIN; + } + n += ret; + } while (n < sizeof(msg)); - tmp = qemu_chr_fe_get_msgfd(s->server_chr); - if (tmp != -1 || version != IVSHMEM_PROTOCOL_VERSION) { + *pfd = qemu_chr_fe_get_msgfd(s->server_chr); + return msg; +} + +static void ivshmem_recv_setup(IVShmemState *s) +{ + int64_t msg; + int fd; + + msg = ivshmem_recv_msg(s, &fd); + if (fd != -1 || msg != IVSHMEM_PROTOCOL_VERSION) { fprintf(stderr, "incompatible version, you are connecting to a ivshmem-" "server using a different protocol please check your setup\n"); - qemu_chr_add_handlers(s->server_chr, NULL, NULL, NULL, s); return; } - IVSHMEM_DPRINTF("version check ok, switch to real chardev handler\n"); - qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read, - NULL, s); + /* + * Receive more messages until we got shared memory. + */ + do { + msg = ivshmem_recv_msg(s, &fd); + process_msg(s, msg, fd); + } while (msg != -1); + + assert(memory_region_is_mapped(&s->ivshmem)); } /* Select the MSI-X vectors used by device. @@ -903,19 +923,29 @@ static void pci_ivshmem_realize(PCIDevice *dev, Error **errp) IVSHMEM_DPRINTF("using shared memory server (socket = %s)\n", s->server_chr->filename); - if (ivshmem_setup_interrupts(s) < 0) { - error_setg(errp, "failed to initialize interrupts"); - return; - } - /* we allocate enough space for 16 peers and grow as needed */ resize_peers(s, 16); s->vm_id = -1; pci_register_bar(dev, 2, attr, &s->bar); - qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, - ivshmem_check_version, NULL, s); + /* + * Receive setup messages from server synchronously. + * Older versions did it asynchronously, but that creates a + * number of entertaining race conditions. + * TODO Propagate errors! Without that, we still have races + * on errors. + */ + ivshmem_recv_setup(s); + if (memory_region_is_mapped(&s->ivshmem)) { + qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, + ivshmem_read, NULL, s); + } + + if (ivshmem_setup_interrupts(s) < 0) { + error_setg(errp, "failed to initialize interrupts"); + return; + } } else { /* just map the file immediately, we're not using a server */ int fd; diff --git a/tests/ivshmem-test.c b/tests/ivshmem-test.c index c1dd7bb..68d6840 100644 --- a/tests/ivshmem-test.c +++ b/tests/ivshmem-test.c @@ -309,35 +309,23 @@ static void test_ivshmem_server(bool msi) ret = ivshmem_server_start(&server); g_assert_cmpint(ret, ==, 0); - setup_vm_with_server(&state1, nvectors, msi); - s1 = &state1; - setup_vm_with_server(&state2, nvectors, msi); - s2 = &state2; - - /* check state before server sends stuff */ - g_assert_cmpuint(in_reg(s1, IVPOSITION), ==, 0xffffffff); - g_assert_cmpuint(in_reg(s2, IVPOSITION), ==, 0xffffffff); - g_assert_cmpuint(qtest_readb(s1->qtest, (uintptr_t)s1->mem_base), ==, 0x00); - thread.server = &server; ret = pipe(thread.pipe); g_assert_cmpint(ret, ==, 0); thread.thread = g_thread_new("ivshmem-server", server_thread, &thread); g_assert(thread.thread != NULL); - /* waiting for devices to become operational */ - while (g_get_monotonic_time() < end_time) { - g_usleep(1000); - if ((int)in_reg(s1, IVPOSITION) >= 0 && - (int)in_reg(s2, IVPOSITION) >= 0) { - break; - } - } + setup_vm_with_server(&state1, nvectors, msi); + s1 = &state1; + setup_vm_with_server(&state2, nvectors, msi); + s2 = &state2; /* check got different VM ids */ vm1 = in_reg(s1, IVPOSITION); vm2 = in_reg(s2, IVPOSITION); - g_assert_cmpuint(vm1, !=, vm2); + g_assert_cmpint(vm1, >=, 0); + g_assert_cmpint(vm2, >=, 0); + g_assert_cmpint(vm1, !=, vm2); /* check number of MSI-X vectors */ global_qtest = s1->qtest;
When configured for interrupts (property "chardev" given), we receive the shared memory from an ivshmem server. We do so asynchronously after realize() completes, by setting up callbacks with qemu_chr_add_handlers(). Keeping server I/O out of realize() that way avoids delays due to a slow server. This is probably relevant only for hot plug. However, this funny "no shared memory, yet" state of the device also causes a raft of issues that are hard or impossible to work around: * The guest is exposed to this state: when we enter and leave it its shared memory contents is apruptly replaced, and device register IVPosition changes. This is a known issue. We document that guests should not access the shared memory after device initialization until the IVPosition register becomes non-negative. For cold plug, the funny state is unlikely to be visible in practice, because we normally receive the shared memory long before the guest gets around to mess with the device. For hot plug, the timing is tighter, but the relative slowness of PCI device configuration has a good chance to hide the funny state. In either case, guests complying with the documented procedure are safe. * Migration becomes racy. If migration completes before the shared memory setup completes on the source, shared memory contents is silently lost. Fortunately, migration is rather unlikely to win this race. If the shared memory's ramblock arrives at the destination before shared memory setup completes, migration fails. There is no known way for a management application to wait for shared memory setup to complete. All you can do is retry failed migration. You can improve your chances by leaving more time between running the destination QEMU and the migrate command. To mitigate silent memory loss, you need to ensure the server initializes shared memory exactly the same on source and destination. These issues are entirely undocumented so far. I'd expect the server to be almost always fast enough to hide these issues. But then rare catastrophic races are in a way the worst kind. This is way more trouble than I'm willing to take from any device. Kill the funny state by receiving shared memory synchronously in realize(). If your hot plug hangs, go kill your ivshmem server. For easier review, this commit only makes the receive synchronous, it doesn't add the necessary error propagation. Without that, the funny state persists. The next commit will do that, and kill it off for real. Signed-off-by: Markus Armbruster <armbru@redhat.com> --- hw/misc/ivshmem.c | 70 +++++++++++++++++++++++++++++++++++++--------------- tests/ivshmem-test.c | 26 ++++++------------- 2 files changed, 57 insertions(+), 39 deletions(-)