diff mbox

[03/21] Remove use of signalfd in block-raw-posix.c

Message ID 1241040038-17183-4-git-send-email-aliguori@us.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Anthony Liguori April 29, 2009, 9:20 p.m. UTC
We don't use signalfd in upstream QEMU.  Instead, we always emulate it.

It's not necessarily a bad thing to use signalfd, but this is something that
should be done upstream.  It certainly does qemu-kvm no harm to use the upstream
code.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
 block-raw-posix.c |   71 +++++++++++++++++++++++++++-------------------------
 1 files changed, 37 insertions(+), 34 deletions(-)

Comments

Avi Kivity April 30, 2009, 9:28 a.m. UTC | #1
Anthony Liguori wrote:
> We don't use signalfd in upstream QEMU.  Instead, we always emulate it.
>   

With an extra thread -> so an extra context switch.

> It's not necessarily a bad thing to use signalfd, but this is something that
> should be done upstream.  It certainly does qemu-kvm no harm to use the upstream
> code.
>   

It will introduce a (likely minor, but real) performance regression.

Instead of this, why not apply the reverse patch to qemu.git?
Anthony Liguori April 30, 2009, 12:54 p.m. UTC | #2
Avi Kivity wrote:
> Anthony Liguori wrote:
>> We don't use signalfd in upstream QEMU.  Instead, we always emulate it.
>>   
>
> With an extra thread -> so an extra context switch.

We don't use an extra thread.  We just install a signal handler that 
writes to a pipe.  At best, the added overhead is that we get EINTRs 
more often but this is something we already handle.

>> It's not necessarily a bad thing to use signalfd, but this is 
>> something that
>> should be done upstream.  It certainly does qemu-kvm no harm to use 
>> the upstream
>> code.
>>   
>
> It will introduce a (likely minor, but real) performance regression.
>
> Instead of this, why not apply the reverse patch to qemu.git?

I'm not sure signalfd really buys us much.  To emulate it requires 
writing a bunch more data to the pipe.  When writing more than 1 byte, 
we have to worry about whether there's a partial write because the pipe 
buffers full).  We also have to make sure to read from the fd in 
properly sized chunks.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity April 30, 2009, 1:05 p.m. UTC | #3
Anthony Liguori wrote:
> Avi Kivity wrote:
>> Anthony Liguori wrote:
>>> We don't use signalfd in upstream QEMU.  Instead, we always emulate it.
>>>   
>>
>> With an extra thread -> so an extra context switch.
>
> We don't use an extra thread.  We just install a signal handler that 
> writes to a pipe.  At best, the added overhead is that we get EINTRs 
> more often but this is something we already handle.

Oh okay.  But signal delivery is slow; for example the FPU needs to be 
reset.

> I'm not sure signalfd really buys us much.  To emulate it requires 
> writing a bunch more data to the pipe.  When writing more than 1 byte, 
> we have to worry about whether there's a partial write because the 
> pipe buffers full).  We also have to make sure to read from the fd in 
> properly sized chunks.

Then we can use one byte writes (and reads) when signalfd is not 
available.  128 byte pipe read/writes should always be atomic on Linux 
though, likely on other OSes too.
Anthony Liguori April 30, 2009, 1:28 p.m. UTC | #4
Avi Kivity wrote:
> Anthony Liguori wrote:
>> Avi Kivity wrote:
>>> Anthony Liguori wrote:
>>>> We don't use signalfd in upstream QEMU.  Instead, we always emulate 
>>>> it.
>>>>   
>>>
>>> With an extra thread -> so an extra context switch.
>>
>> We don't use an extra thread.  We just install a signal handler that 
>> writes to a pipe.  At best, the added overhead is that we get EINTRs 
>> more often but this is something we already handle.
>
> Oh okay.  But signal delivery is slow; for example the FPU needs to be 
> reset.

Is it really justified to add all of this extra code (including signalfd 
emulation) for something that probably isn't even measurable?

I like using wiz-bang features of Linux as much as the next guy, but I 
think we're stretching to justify it here :-)
Avi Kivity April 30, 2009, 1:36 p.m. UTC | #5
Anthony Liguori wrote:
>>
>> Oh okay.  But signal delivery is slow; for example the FPU needs to 
>> be reset.
>
> Is it really justified to add all of this extra code (including 
> signalfd emulation) for something that probably isn't even measurable?

We don't have to add signalfd emulation; we can simply use signal+pipe 
in that case.

We won't know if it's measurable or not until we measure it (or not).

>
> I like using wiz-bang features of Linux as much as the next guy, but I 
> think we're stretching to justify it here :-)
>

I think it's worth it in this case.  It will become more important in 
time, too.
Avi Kivity May 3, 2009, 4:48 p.m. UTC | #6
Avi Kivity wrote:
> Anthony Liguori wrote:
>>>
>>> Oh okay.  But signal delivery is slow; for example the FPU needs to 
>>> be reset.
>>
>> Is it really justified to add all of this extra code (including 
>> signalfd emulation) for something that probably isn't even measurable?
>
> We don't have to add signalfd emulation; we can simply use signal+pipe 
> in that case.
>
> We won't know if it's measurable or not until we measure it (or not).
>
>>
>> I like using wiz-bang features of Linux as much as the next guy, but 
>> I think we're stretching to justify it here :-)
>>
>
> I think it's worth it in this case.  It will become more important in 
> time, too.
>

Out of curiosity, I measured this:

[avi@balrog test]$ ./signal
 2777 ns/signal (pipe)
  844 ns/signal (signalfd)


At 10000 signals/sec, this will save about 2% cpu time.  It's definitely 
worthwhile for the handful of lines it takes.

test program:

#include <signal.h>
#include <unistd.h>
#include <sys/signalfd.h>
#include <sys/time.h>
#include <stdio.h>

static int wfd;

static void handler(int signum)
{
    char b;

    write(wfd, &b, 1);
}

static int create_pipe(void)
{
    int fd[2];
    sigset_t s;

    pipe(fd);
    wfd = fd[1];
    signal(SIGUSR1, handler);
    sigemptyset(&s);
    sigaddset(&s, SIGUSR1);
    sigprocmask(SIG_UNBLOCK, &s, NULL);
    return fd[0];
}

static int create_signalfd(void)
{
    sigset_t s;

    sigemptyset(&s);
    sigaddset(&s, SIGUSR1);
    sigprocmask(SIG_BLOCK, &s, NULL);
    return signalfd(-1, &s, 0);
}

static uint64_t time_usec(void)
{
    struct timeval tv;

    gettimeofday(&tv, NULL);
    return (uint64_t)tv.tv_sec * 1000000 + tv.tv_usec;
}

#define N 10000000

static void test(const char *name, int fd, int len)
{
    int i;
    uint64_t t1, t2;
    char buf[128];

    t1 = time_usec();
    for (i = 0; i < N; ++i) {
        raise(SIGUSR1);
        read(fd, buf, len);
    }
    t2 = time_usec();

    close(fd);

    printf("%5d ns/signal (%s)\n", 1000 * (t2 - t1) / N, name);
}

int main(int ac, char **av)
{
    test("pipe", create_pipe(), 1);
    test("signalfd", create_signalfd(), 128);
    return 0;
}
diff mbox

Patch

diff --git a/block-raw-posix.c b/block-raw-posix.c
index f033bae..822839f 100644
--- a/block-raw-posix.c
+++ b/block-raw-posix.c
@@ -25,7 +25,6 @@ 
 #include "qemu-timer.h"
 #include "qemu-char.h"
 #include "block_int.h"
-#include "compatfd.h"
 #include <assert.h>
 #ifdef CONFIG_AIO
 #include "posix-aio-compat.h"
@@ -481,7 +480,7 @@  typedef struct RawAIOCB {
 
 typedef struct PosixAioState
 {
-    int fd;
+    int rfd, wfd;
     RawAIOCB *first_aio;
 } PosixAioState;
 
@@ -490,29 +489,18 @@  static void posix_aio_read(void *opaque)
     PosixAioState *s = opaque;
     RawAIOCB *acb, **pacb;
     int ret;
-    size_t offset;
-    union {
-        struct qemu_signalfd_siginfo siginfo;
-        char buf[128];
-    } sig;
-
-    /* try to read from signalfd, don't freak out if we can't read anything */
-    offset = 0;
-    while (offset < 128) {
-        ssize_t len;
-
-        len = read(s->fd, sig.buf + offset, 128 - offset);
-        if (len == -1 && errno == EINTR)
-            continue;
-        if (len == -1 && errno == EAGAIN) {
-            /* there is no natural reason for this to happen,
-             * so we'll spin hard until we get everything just
-             * to be on the safe side. */
-            if (offset > 0)
-                continue;
-        }
+    ssize_t len;
+
+    /* read all bytes from signal pipe */
+    for (;;) {
+        char bytes[16];
 
-        offset += len;
+        len = read(s->rfd, bytes, sizeof(bytes));
+        if (len == -1 && errno == EINTR)
+            continue; /* try again */
+        if (len == sizeof(bytes))
+            continue; /* more to read */
+        break;
     }
 
     for(;;) {
@@ -559,10 +547,22 @@  static int posix_aio_flush(void *opaque)
 
 static PosixAioState *posix_aio_state;
 
+static void aio_signal_handler(int signum)
+{
+    if (posix_aio_state) {
+        char byte = 0;
+
+        write(posix_aio_state->wfd, &byte, sizeof(byte));
+    }
+
+    qemu_service_io();
+}
+
 static int posix_aio_init(void)
 {
-    sigset_t mask;
+    struct sigaction act;
     PosixAioState *s;
+    int fds[2];
     struct qemu_paioinit ai;
   
     if (posix_aio_state)
@@ -570,21 +570,24 @@  static int posix_aio_init(void)
 
     s = qemu_malloc(sizeof(PosixAioState));
 
-    /* Make sure to block AIO signal */
-    sigemptyset(&mask);
-    sigaddset(&mask, SIGUSR2);
-    sigprocmask(SIG_BLOCK, &mask, NULL);
+    sigfillset(&act.sa_mask);
+    act.sa_flags = 0; /* do not restart syscalls to interrupt select() */
+    act.sa_handler = aio_signal_handler;
+    sigaction(SIGUSR2, &act, NULL);
 
     s->first_aio = NULL;
-    s->fd = qemu_signalfd(&mask);
-    if (s->fd == -1) {
-        fprintf(stderr, "failed to create signalfd\n");
+    if (pipe(fds) == -1) {
+        fprintf(stderr, "failed to create pipe\n");
         return -errno;
     }
 
-    fcntl(s->fd, F_SETFL, O_NONBLOCK);
+    s->rfd = fds[0];
+    s->wfd = fds[1];
+
+    fcntl(s->rfd, F_SETFL, O_NONBLOCK);
+    fcntl(s->wfd, F_SETFL, O_NONBLOCK);
 
-    qemu_aio_set_fd_handler(s->fd, posix_aio_read, NULL, posix_aio_flush, s);
+    qemu_aio_set_fd_handler(s->rfd, posix_aio_read, NULL, posix_aio_flush, s);
 
     memset(&ai, 0, sizeof(ai));
     ai.aio_threads = 64;