Message ID | 1241040038-17183-4-git-send-email-aliguori@us.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Anthony Liguori wrote: > We don't use signalfd in upstream QEMU. Instead, we always emulate it. > With an extra thread -> so an extra context switch. > It's not necessarily a bad thing to use signalfd, but this is something that > should be done upstream. It certainly does qemu-kvm no harm to use the upstream > code. > It will introduce a (likely minor, but real) performance regression. Instead of this, why not apply the reverse patch to qemu.git?
Avi Kivity wrote: > Anthony Liguori wrote: >> We don't use signalfd in upstream QEMU. Instead, we always emulate it. >> > > With an extra thread -> so an extra context switch. We don't use an extra thread. We just install a signal handler that writes to a pipe. At best, the added overhead is that we get EINTRs more often but this is something we already handle. >> It's not necessarily a bad thing to use signalfd, but this is >> something that >> should be done upstream. It certainly does qemu-kvm no harm to use >> the upstream >> code. >> > > It will introduce a (likely minor, but real) performance regression. > > Instead of this, why not apply the reverse patch to qemu.git? I'm not sure signalfd really buys us much. To emulate it requires writing a bunch more data to the pipe. When writing more than 1 byte, we have to worry about whether there's a partial write because the pipe buffers full). We also have to make sure to read from the fd in properly sized chunks. Regards, Anthony Liguori -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Anthony Liguori wrote: > Avi Kivity wrote: >> Anthony Liguori wrote: >>> We don't use signalfd in upstream QEMU. Instead, we always emulate it. >>> >> >> With an extra thread -> so an extra context switch. > > We don't use an extra thread. We just install a signal handler that > writes to a pipe. At best, the added overhead is that we get EINTRs > more often but this is something we already handle. Oh okay. But signal delivery is slow; for example the FPU needs to be reset. > I'm not sure signalfd really buys us much. To emulate it requires > writing a bunch more data to the pipe. When writing more than 1 byte, > we have to worry about whether there's a partial write because the > pipe buffers full). We also have to make sure to read from the fd in > properly sized chunks. Then we can use one byte writes (and reads) when signalfd is not available. 128 byte pipe read/writes should always be atomic on Linux though, likely on other OSes too.
Avi Kivity wrote: > Anthony Liguori wrote: >> Avi Kivity wrote: >>> Anthony Liguori wrote: >>>> We don't use signalfd in upstream QEMU. Instead, we always emulate >>>> it. >>>> >>> >>> With an extra thread -> so an extra context switch. >> >> We don't use an extra thread. We just install a signal handler that >> writes to a pipe. At best, the added overhead is that we get EINTRs >> more often but this is something we already handle. > > Oh okay. But signal delivery is slow; for example the FPU needs to be > reset. Is it really justified to add all of this extra code (including signalfd emulation) for something that probably isn't even measurable? I like using wiz-bang features of Linux as much as the next guy, but I think we're stretching to justify it here :-)
Anthony Liguori wrote: >> >> Oh okay. But signal delivery is slow; for example the FPU needs to >> be reset. > > Is it really justified to add all of this extra code (including > signalfd emulation) for something that probably isn't even measurable? We don't have to add signalfd emulation; we can simply use signal+pipe in that case. We won't know if it's measurable or not until we measure it (or not). > > I like using wiz-bang features of Linux as much as the next guy, but I > think we're stretching to justify it here :-) > I think it's worth it in this case. It will become more important in time, too.
Avi Kivity wrote: > Anthony Liguori wrote: >>> >>> Oh okay. But signal delivery is slow; for example the FPU needs to >>> be reset. >> >> Is it really justified to add all of this extra code (including >> signalfd emulation) for something that probably isn't even measurable? > > We don't have to add signalfd emulation; we can simply use signal+pipe > in that case. > > We won't know if it's measurable or not until we measure it (or not). > >> >> I like using wiz-bang features of Linux as much as the next guy, but >> I think we're stretching to justify it here :-) >> > > I think it's worth it in this case. It will become more important in > time, too. > Out of curiosity, I measured this: [avi@balrog test]$ ./signal 2777 ns/signal (pipe) 844 ns/signal (signalfd) At 10000 signals/sec, this will save about 2% cpu time. It's definitely worthwhile for the handful of lines it takes. test program: #include <signal.h> #include <unistd.h> #include <sys/signalfd.h> #include <sys/time.h> #include <stdio.h> static int wfd; static void handler(int signum) { char b; write(wfd, &b, 1); } static int create_pipe(void) { int fd[2]; sigset_t s; pipe(fd); wfd = fd[1]; signal(SIGUSR1, handler); sigemptyset(&s); sigaddset(&s, SIGUSR1); sigprocmask(SIG_UNBLOCK, &s, NULL); return fd[0]; } static int create_signalfd(void) { sigset_t s; sigemptyset(&s); sigaddset(&s, SIGUSR1); sigprocmask(SIG_BLOCK, &s, NULL); return signalfd(-1, &s, 0); } static uint64_t time_usec(void) { struct timeval tv; gettimeofday(&tv, NULL); return (uint64_t)tv.tv_sec * 1000000 + tv.tv_usec; } #define N 10000000 static void test(const char *name, int fd, int len) { int i; uint64_t t1, t2; char buf[128]; t1 = time_usec(); for (i = 0; i < N; ++i) { raise(SIGUSR1); read(fd, buf, len); } t2 = time_usec(); close(fd); printf("%5d ns/signal (%s)\n", 1000 * (t2 - t1) / N, name); } int main(int ac, char **av) { test("pipe", create_pipe(), 1); test("signalfd", create_signalfd(), 128); return 0; }
diff --git a/block-raw-posix.c b/block-raw-posix.c index f033bae..822839f 100644 --- a/block-raw-posix.c +++ b/block-raw-posix.c @@ -25,7 +25,6 @@ #include "qemu-timer.h" #include "qemu-char.h" #include "block_int.h" -#include "compatfd.h" #include <assert.h> #ifdef CONFIG_AIO #include "posix-aio-compat.h" @@ -481,7 +480,7 @@ typedef struct RawAIOCB { typedef struct PosixAioState { - int fd; + int rfd, wfd; RawAIOCB *first_aio; } PosixAioState; @@ -490,29 +489,18 @@ static void posix_aio_read(void *opaque) PosixAioState *s = opaque; RawAIOCB *acb, **pacb; int ret; - size_t offset; - union { - struct qemu_signalfd_siginfo siginfo; - char buf[128]; - } sig; - - /* try to read from signalfd, don't freak out if we can't read anything */ - offset = 0; - while (offset < 128) { - ssize_t len; - - len = read(s->fd, sig.buf + offset, 128 - offset); - if (len == -1 && errno == EINTR) - continue; - if (len == -1 && errno == EAGAIN) { - /* there is no natural reason for this to happen, - * so we'll spin hard until we get everything just - * to be on the safe side. */ - if (offset > 0) - continue; - } + ssize_t len; + + /* read all bytes from signal pipe */ + for (;;) { + char bytes[16]; - offset += len; + len = read(s->rfd, bytes, sizeof(bytes)); + if (len == -1 && errno == EINTR) + continue; /* try again */ + if (len == sizeof(bytes)) + continue; /* more to read */ + break; } for(;;) { @@ -559,10 +547,22 @@ static int posix_aio_flush(void *opaque) static PosixAioState *posix_aio_state; +static void aio_signal_handler(int signum) +{ + if (posix_aio_state) { + char byte = 0; + + write(posix_aio_state->wfd, &byte, sizeof(byte)); + } + + qemu_service_io(); +} + static int posix_aio_init(void) { - sigset_t mask; + struct sigaction act; PosixAioState *s; + int fds[2]; struct qemu_paioinit ai; if (posix_aio_state) @@ -570,21 +570,24 @@ static int posix_aio_init(void) s = qemu_malloc(sizeof(PosixAioState)); - /* Make sure to block AIO signal */ - sigemptyset(&mask); - sigaddset(&mask, SIGUSR2); - sigprocmask(SIG_BLOCK, &mask, NULL); + sigfillset(&act.sa_mask); + act.sa_flags = 0; /* do not restart syscalls to interrupt select() */ + act.sa_handler = aio_signal_handler; + sigaction(SIGUSR2, &act, NULL); s->first_aio = NULL; - s->fd = qemu_signalfd(&mask); - if (s->fd == -1) { - fprintf(stderr, "failed to create signalfd\n"); + if (pipe(fds) == -1) { + fprintf(stderr, "failed to create pipe\n"); return -errno; } - fcntl(s->fd, F_SETFL, O_NONBLOCK); + s->rfd = fds[0]; + s->wfd = fds[1]; + + fcntl(s->rfd, F_SETFL, O_NONBLOCK); + fcntl(s->wfd, F_SETFL, O_NONBLOCK); - qemu_aio_set_fd_handler(s->fd, posix_aio_read, NULL, posix_aio_flush, s); + qemu_aio_set_fd_handler(s->rfd, posix_aio_read, NULL, posix_aio_flush, s); memset(&ai, 0, sizeof(ai)); ai.aio_threads = 64;
We don't use signalfd in upstream QEMU. Instead, we always emulate it. It's not necessarily a bad thing to use signalfd, but this is something that should be done upstream. It certainly does qemu-kvm no harm to use the upstream code. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> --- block-raw-posix.c | 71 +++++++++++++++++++++++++++------------------------- 1 files changed, 37 insertions(+), 34 deletions(-)