Message ID | 14c33104778e77fcf2e7f0df2a1dd96fbcaf49d7.1571905346.git.jag.raman@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Initial support of multi-process qemu | expand |
On Thu, Oct 24, 2019 at 05:09:11AM -0400, Jagannathan Raman wrote: > +static void broadcast_msg(MPQemuMsg *msg, bool need_reply) > +{ > + PCIProxyDev *entry; > + unsigned int pid; > + int wait; > + > + QLIST_FOREACH(entry, &proxy_dev_list.devices, next) { > + if (need_reply) { > + wait = eventfd(0, EFD_NONBLOCK); > + msg->num_fds = 1; > + msg->fds[0] = wait; > + } > + > + mpqemu_msg_send(entry->mpqemu_link, msg, entry->mpqemu_link->com); > + if (need_reply) { > + pid = (uint32_t)wait_for_remote(wait); Sometimes QEMU really needs to wait for the remote process before it can make progress. I think this is not one of those cases though. Since QEMU is event-driven it's problematic to invoke blocking system calls. The remote process might not respond for a significant amount of time. Other QEMU threads will be held up waiting for the QEMU global mutex in the meantime (because we hold it!). Please implement heartbeat/ping asynchronously. The wait eventfd should be read by an event loop fd handler instead. That way QEMU can continue with running the VM while waiting for the remote process. This will also improve guest performance because there will be less jitter (random latency because the event loop is held up waiting for remote processes for short periods of time).
On 11/11/2019 11:27 AM, Stefan Hajnoczi wrote: > On Thu, Oct 24, 2019 at 05:09:11AM -0400, Jagannathan Raman wrote: >> +static void broadcast_msg(MPQemuMsg *msg, bool need_reply) >> +{ >> + PCIProxyDev *entry; >> + unsigned int pid; >> + int wait; >> + >> + QLIST_FOREACH(entry, &proxy_dev_list.devices, next) { >> + if (need_reply) { >> + wait = eventfd(0, EFD_NONBLOCK); >> + msg->num_fds = 1; >> + msg->fds[0] = wait; >> + } >> + >> + mpqemu_msg_send(entry->mpqemu_link, msg, entry->mpqemu_link->com); >> + if (need_reply) { >> + pid = (uint32_t)wait_for_remote(wait); > > Sometimes QEMU really needs to wait for the remote process before it can > make progress. I think this is not one of those cases though. > > Since QEMU is event-driven it's problematic to invoke blocking system > calls. The remote process might not respond for a significant amount of > time. Other QEMU threads will be held up waiting for the QEMU global > mutex in the meantime (because we hold it!). There are places where we wait synchronously for the remote process. However, these synchronous waits carry a timeout to prevent the hang situation you described above. We will add an error recovery in the future. That is, we will respawn the remote process if the QEMU times out waiting for it. > > Please implement heartbeat/ping asynchronously. The wait eventfd should > be read by an event loop fd handler instead. That way QEMU can continue > with running the VM while waiting for the remote process. In the current implementation, the heartbeat/ping is asynchronous. start_heartbeat_timer() sets up a timer to perform the ping. Thanks! -- Jag > > This will also improve guest performance because there will be less > jitter (random latency because the event loop is held up waiting for > remote processes for short periods of time). >
On Wed, Nov 13, 2019 at 11:01:07AM -0500, Jag Raman wrote: > > > On 11/11/2019 11:27 AM, Stefan Hajnoczi wrote: > > On Thu, Oct 24, 2019 at 05:09:11AM -0400, Jagannathan Raman wrote: > > > +static void broadcast_msg(MPQemuMsg *msg, bool need_reply) > > > +{ > > > + PCIProxyDev *entry; > > > + unsigned int pid; > > > + int wait; > > > + > > > + QLIST_FOREACH(entry, &proxy_dev_list.devices, next) { > > > + if (need_reply) { > > > + wait = eventfd(0, EFD_NONBLOCK); > > > + msg->num_fds = 1; > > > + msg->fds[0] = wait; > > > + } > > > + > > > + mpqemu_msg_send(entry->mpqemu_link, msg, entry->mpqemu_link->com); > > > + if (need_reply) { > > > + pid = (uint32_t)wait_for_remote(wait); > > > > Sometimes QEMU really needs to wait for the remote process before it can > > make progress. I think this is not one of those cases though. > > > > Since QEMU is event-driven it's problematic to invoke blocking system > > calls. The remote process might not respond for a significant amount of > > time. Other QEMU threads will be held up waiting for the QEMU global > > mutex in the meantime (because we hold it!). > > There are places where we wait synchronously for the remote process. > However, these synchronous waits carry a timeout to prevent the hang > situation you described above. > > We will add an error recovery in the future. That is, we will respawn > the remote process if the QEMU times out waiting for it. Even with a timeout, in the meantime the event loop is blocked. That means timers will be delayed by a large amount, the monitor will be unresponsive, etc. > > > > Please implement heartbeat/ping asynchronously. The wait eventfd should > > be read by an event loop fd handler instead. That way QEMU can continue > > with running the VM while waiting for the remote process. > > In the current implementation, the heartbeat/ping is asynchronous. > start_heartbeat_timer() sets up a timer to perform the ping. The heartbeat/ping is synchronous because broadcast_msg() blocks. Stefan
diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c index fc1c731..691b991 100644 --- a/hw/proxy/qemu-proxy.c +++ b/hw/proxy/qemu-proxy.c @@ -53,14 +53,96 @@ #include "hw/boards.h" #include "include/qemu/log.h" +QEMUTimer *hb_timer; static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp); static void setup_irqfd(PCIProxyDev *dev); +static void pci_dev_exit(PCIDevice *dev); +static void start_heartbeat_timer(void); +static void stop_heartbeat_timer(void); +static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx); +static void broadcast_msg(MPQemuMsg *msg, bool need_reply); + +static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx) +{ + /* TODO: Add proper handler. */ + printf("Child (pid %d) is dead? Signal is %d, Exit code is %d.\n", + siginfo->si_pid, siginfo->si_signo, siginfo->si_code); +} + +static void broadcast_msg(MPQemuMsg *msg, bool need_reply) +{ + PCIProxyDev *entry; + unsigned int pid; + int wait; + + QLIST_FOREACH(entry, &proxy_dev_list.devices, next) { + if (need_reply) { + wait = eventfd(0, EFD_NONBLOCK); + msg->num_fds = 1; + msg->fds[0] = wait; + } + + mpqemu_msg_send(entry->mpqemu_link, msg, entry->mpqemu_link->com); + if (need_reply) { + pid = (uint32_t)wait_for_remote(wait); + close(wait); + /* TODO: Add proper handling. */ + if (pid) { + need_reply = 0; + } + } + } +} + +#define NOP_INTERVAL 1000000 + +static void remote_ping(void *opaque) +{ + MPQemuMsg msg; + + memset(&msg, 0, sizeof(MPQemuMsg)); + + msg.num_fds = 0; + msg.cmd = PROXY_PING; + msg.bytestream = 0; + msg.size = 0; + + broadcast_msg(&msg, true); + timer_mod(hb_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + NOP_INTERVAL); + +} + +void start_heartbeat_timer(void) +{ + hb_timer = timer_new_ms(QEMU_CLOCK_VIRTUAL, + remote_ping, + &proxy_dev_list); + timer_mod(hb_timer, qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + NOP_INTERVAL); + +} + +static void stop_heartbeat_timer(void) +{ + timer_del(hb_timer); + timer_free(hb_timer); +} + +static void set_sigchld_handler(void) +{ + struct sigaction sa_sigterm; + memset(&sa_sigterm, 0, sizeof(sa_sigterm)); + sa_sigterm.sa_sigaction = childsig_handler; + sa_sigterm.sa_flags = SA_SIGINFO | SA_NOCLDWAIT | SA_NOCLDSTOP; + sigaction(SIGCHLD, &sa_sigterm, NULL); +} static void proxy_ready(PCIDevice *dev) { PCIProxyDev *pdev = PCI_PROXY_DEV(dev); setup_irqfd(pdev); + set_sigchld_handler(); + start_heartbeat_timer(); } static void set_remote_opts(PCIDevice *dev, QDict *qdict, unsigned int cmd) @@ -259,6 +341,7 @@ static void pci_proxy_dev_class_init(ObjectClass *klass, void *data) PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); k->realize = pci_proxy_dev_realize; + k->exit = pci_dev_exit; k->config_read = pci_proxy_read_config; k->config_write = pci_proxy_write_config; } @@ -397,6 +480,24 @@ static void pci_proxy_dev_realize(PCIDevice *device, Error **errp) dev->proxy_ready = proxy_ready; } +static void pci_dev_exit(PCIDevice *pdev) +{ + PCIProxyDev *entry, *sentry; + PCIProxyDev *dev = PCI_PROXY_DEV(pdev); + + stop_heartbeat_timer(); + + QLIST_FOREACH_SAFE(entry, &proxy_dev_list.devices, next, sentry) { + if (entry->remote_pid == dev->remote_pid) { + QLIST_REMOVE(entry, next); + } + } + + if (!QLIST_EMPTY(&proxy_dev_list.devices)) { + start_heartbeat_timer(); + } +} + static void send_bar_access_msg(PCIProxyDev *dev, MemoryRegion *mr, bool write, hwaddr addr, uint64_t *val, unsigned size, bool memory) diff --git a/include/io/mpqemu-link.h b/include/io/mpqemu-link.h index 3145b0e..16a913b 100644 --- a/include/io/mpqemu-link.h +++ b/include/io/mpqemu-link.h @@ -72,6 +72,7 @@ typedef enum { DRIVE_OPTS, DEVICE_ADD, DEVICE_DEL, + PROXY_PING, MAX, } mpqemu_cmd_t;